LM Studio
Polished desktop GUI for local LLMs. Built-in HuggingFace search, OpenAI-compatible local server, side-by-side conversations.
LM Studio is the easiest path from "I want to chat with a local AI" to actually doing it, for users who don't want to touch a terminal. The desktop app installs on macOS, Windows, and Linux. You browse a model catalog inside the app, click download, click chat. The whole experience is engineered around the assumption that the user has never run brew install and never wants to. That's a real design achievement. It's also where the moat ends — LM Studio is excellent for the desktop-app local-AI use case and a wrong fit for production serving, automation pipelines, or any workflow that lives outside one person's laptop.
Architecture and what LM Studio actually is
LM Studio is a closed-source desktop application built on top of llama.cpp (and optionally MLX on Apple Silicon for the MLX-format models in the catalog). The app shell is Electron-based; the inference engine is bundled. The official site hosts the catalog, and individual model cards link back to their Hugging Face sources for transparency.
Three surfaces matter:
- Chat tab. ChatGPT-style conversation UI, multi-turn, with full sampler controls in a sidebar. The flagship use case.
- Server tab. Spin up an OpenAI-compatible HTTP API on a configurable port. This is how LM Studio plugs into IDE plugins (Continue, Cursor, Aider via the OpenAI baseUrl override) without those tools needing to know about LM Studio specifically.
- My Models tab. Local model file management — download, delete, see disk usage. The catalog browser also lives here.
LM Studio's architectural choice that matters most: it bundles the inference engine. You don't separately install llama.cpp; the app ships its own copy and updates it on its own cadence. That's a UX win for newcomers (one install, no toolchain) and an operational concern for power users (you can't pin a specific llama.cpp commit, and the bundled version may lag master by weeks).
The licensing nuance: LM Studio itself is free for personal use, but commercial use requires reading their terms. The bundled llama.cpp + MLX engines are MIT-licensed, but the LM Studio app shell isn't. For team / business use this matters; for individual operators it usually doesn't.
Local stack compatibility
LM Studio's hardware story inherits from llama.cpp + MLX: anywhere those run, LM Studio runs. The polish, however, is uneven by platform. Apple Silicon is the reference platform and the app feels native there. Windows + NVIDIA is well-tested. Linux works but feels like a port. AMD ROCm support arrived later than CUDA and trails it in stability. CPU-only is a graceful degradation path — the app won't refuse to load on a weak machine; it'll just route inference to CPU and warn you.
The compatibility matrix below ranks operator-grade readiness across paths. For runtime-vs-runtime comparisons see /compare/engines/lm-studio-vs-open-webui, /compare/engines/ollama-vs-lm-studio, and /compare/runtimes.
Setup + day-1 reality
Install: download the installer from lmstudio.ai, run it. On macOS the installer drops the app in /Applications. On Windows it auto-creates start-menu shortcuts. On Linux it ships as an AppImage (you mark it executable, run it directly).
First launch: LM Studio prompts you to choose between "Power User" and "User" mode. The choice mostly affects how many sampler controls show in the sidebar. Pick Power User if you've ever tuned temperature explicitly; pick User otherwise. Switchable later from settings.
Model download: search the in-app catalog (powered by Hugging Face under the hood). The catalog hides multi-uploader confusion — you don't see TheBloke / Bartowski / unsloth duplicates of the same canonical model. LM Studio surfaces an editorially-curated subset with quality marks for "this quant works on your hardware" + "this quant is recommended for your RAM/VRAM."
Default storage path: ~/.cache/lm-studio/models on macOS/Linux, %USERPROFILE%\.cache\lm-studio\models on Windows. LM Studio caches per-architecture (so a 7B Q4 takes ~5 GB regardless of how many "models" reference it). On a small system disk this matters — change the path in settings before you start downloading.
Server tab: point IDE plugins (or any OpenAI-compatible client) at http://localhost:1234/v1. The default port is 1234, not 8080 or 11434, so config-by-pattern won't work — set the baseUrl explicitly.
Operational concerns
- Closed-source for the app shell. You can't self-build, you can't audit the binary, you can't fork. For users who care about that, this is disqualifying. For users who don't, it's invisible.
- Manual updates. LM Studio doesn't auto-update by default. The app prompts when an update is available. Skipping updates means missing llama.cpp engine improvements + new model support; accepting them sometimes breaks settings.
- No daemon mode in the traditional sense. The server tab is part of the app — the app must be open. On macOS, you can run the app in the background; on Windows/Linux, closing the window stops the server. If you want a "local OpenAI API on a homelab box," LM Studio is the wrong tool — use llama-server or Ollama.
- Catalog vs HF gap. Models that aren't yet in LM Studio's curated catalog can be loaded manually (drop a GGUF / MLX file into the models directory), but the path is awkward compared to the in-app browse-and-click flow. Expect a 1-3 day delay between a major model release and LM Studio's catalog showing it cleanly.
- Resource visibility. The app shows VRAM + RAM usage in the bottom bar. This is real and useful for debugging out-of-memory crashes during long-context inference. Operators new to local AI use this constantly.
Performance reality
LM Studio's tok/s is whatever llama.cpp + MLX deliver on the host hardware. On Apple Silicon the MLX path is occasionally faster than llama.cpp Metal for the same model — LM Studio picks per-model. On NVIDIA the CUDA path matches Ollama / direct llama-server within a few percent. The Electron app shell adds RAM overhead (~150-300 MB resident) but doesn't meaningfully impact tok/s.
What LM Studio is genuinely better at: not crashing on user error. The app validates model + hardware fit before loading (won't try to load a 70B FP16 onto a 16 GB machine), warns on context settings that would exhaust memory, and falls back to CPU gracefully rather than erroring out. That's not a benchmark win, but it's a daily-driver win.
Failure modes (what breaks)
Ranked by community-error frequency:
- Server tab off + IDE plugin pointing at it. The most common LM Studio gotcha — user closes the app, IDE plugin shows "OpenAI API offline." Fix: open LM Studio, go to Server tab, click Start Server.
- Default port collision. Port 1234 is occasionally claimed by another dev tool. LM Studio shows a clear error; change the port in settings.
- Catalog model fails to load. Even curated catalog models occasionally have wrong-template metadata, especially for new architectures. Workaround: try a different uploader's quant of the same model.
- Disk full mid-download. LM Studio doesn't pre-check free space before starting a 40 GB download. Operator hits no-disk-space halfway through and has to clean up partial files. Annoying but not destructive.
- GPU detection lost after driver update. Identical pattern to Ollama — a Windows driver update can confuse the GPU-detection probe. Restart the app; if still wrong, restart the machine.
- macOS unified-memory pressure. On a 16 GB Mac, loading a 13B Q4 model (~7 GB weights) plus a few browser tabs can swap aggressively. LM Studio shows this in the bottom-bar VRAM gauge but doesn't proactively warn.
How LM Studio compares
Compared to Ollama: different audiences. Ollama is "first thing developers reach for from a terminal." LM Studio is "first thing non-developers reach for from a desktop." If your IDE plugin is happy talking to either, the choice comes down to whether you prefer a daemon or an app. See /compare/engines/ollama-vs-lm-studio.
Compared to Open WebUI: Open WebUI is a self-hosted browser-based chat frontend that needs Ollama (or another OpenAI-compatible backend) running separately. LM Studio is one app that bundles backend + frontend. Open WebUI is multi-user-ready; LM Studio isn't. See /compare/engines/lm-studio-vs-open-webui.
Compared to AnythingLLM: AnythingLLM is RAG-first (built around document ingestion + retrieval). LM Studio is chat-first. They serve overlapping but distinct use cases.
Compared to running llama.cpp + a custom GUI: yes, you can DIY LM Studio's surface. Most operators don't. The time from zero to working chat is measured in minutes for LM Studio and hours for the DIY path.
Deployment paths
Three operator-grade deployment shapes are documented in the structured deployment-paths section below: single-user desktop chat (the flagship use case), local OpenAI-compat for IDE plugins (Continue / Cursor / Aider pointing at port 1234), and offline laptop development without internet (cached catalog + air-gapped inference).
Editorial verdict
LM Studio is the right answer when the operator's primary surface is a desktop UI. It's the wrong answer for headless servers, multi-user setups, automated pipelines, or any deployment that needs to survive an app-not-open state. The bundled llama.cpp + MLX engine matches the standalone runtimes on performance, so the choice between LM Studio and CLI llama.cpp / Ollama is about UX preference, not about technical capability.
The closed-source app shell is the philosophical wrinkle. For users who don't care: invisible. For users who do: disqualifying. There's no middle ground there, and LM Studio's team has been transparent about that tradeoff. The bundled engines are open source; the polish on top isn't.
Last reviewed 2026-05-08 by RunLocalAI editorial. Reproduce or correct: /submit/feedback.
| Status | Runtime / Stack | Notes |
|---|---|---|
| Excellent | Apple Silicon (M1-M4, Metal + MLX) | Reference platform. Native Apple feel. Bundled MLX-LM path is sometimes faster than the Metal path; LM Studio picks automatically per model. Sweet spot for 7B-13B at conversational latencies; M-Max + 64GB unified memory runs 70B Q4. |
| Excellent | Windows + NVIDIA (RTX 30/40/50) | Well-tested CUDA path. App polish on Windows matches macOS. Driver-update edge cases occasionally break GPU detection — restart the app to recover. |
| Good | Linux + NVIDIA (CUDA) | AppImage works on most modern distros. CUDA path is solid but Linux UX feels like a port — minor visual quirks, occasional missing menu items. Functional, not polished. |
| Good | AMD ROCm (RX 7000 / 9000) | ROCm support arrived after CUDA and stabilized through 2025. Works on Linux + Windows for supported cards. Day-zero new model support sometimes lags the NVIDIA path. |
| Good | CPU-only (x86_64 / ARM64) | Graceful CPU fallback. The app warns when GPU isn't available and routes inference to CPU. Usable for 7B Q4 on 16 GB RAM laptops; tok/s in single digits but the app doesn't crash. |
| Limited | Intel Arc + iGPU | No first-class Intel GPU acceleration in the bundled engine. Falls back to CPU on Intel-only machines. Use llama.cpp directly with Vulkan / Sycl backend if you need Arc acceleration. |
| Partial | Multi-GPU layer-split | LM Studio inherits llama.cpp's layer-split; configurable via advanced settings. Single-stream serial. Concurrent serving is not the design intent. |
| Limited | Datacenter (H100 / A100 / MI300X) | Runs but underutilizes. The desktop-app architecture is wrong for datacenter SKUs — use vLLM or SGLang for production. LM Studio on a workstation H100 is fine for ad-hoc dev. |
Single-user desktop chat
trivialThe flagship use case. Install the app, browse the in-app catalog, click download on a recommended quant, click Chat. Zero terminal time. The right path for non-developer users, for first-time local-AI users, and for anyone whose primary interaction is a chat UI rather than a CLI / IDE plugin.
OpenAI-compat for IDE plugins
moderatePoint Continue / Cursor / Aider / OpenCode at http://localhost:1234/v1 with any model name. LM Studio routes the OpenAI-style request to the loaded model. The catch: the LM Studio app must be running, and the server must be started in the Server tab. If the app is closed, IDE plugins see API offline. For headless serving across reboots, use llama-server or Ollama instead.
Offline laptop development
moderatePre-download the models you need while online (LM Studio caches them locally), then disconnect entirely. The app works fully offline — no telemetry-required-for-inference, no required network calls during chat. The right path for travel, air-gapped environments, sensitive document review without cloud exposure. Complement with offline-RAG via AnythingLLM if you need document ingestion.
Setup guidance
Download the installer from lmstudio.ai for macOS (Apple Silicon + Intel), Windows, or Linux (AppImage). Install and launch the GUI. The home screen shows a model catalog — search and download models with one click (they're GGUF files sourced from HuggingFace). After downloading a model, select it in the chat tab and start typing. The local server tab exposes an OpenAI-compatible API at http://localhost:1234/v1/chat/completions — toggle it on with the "Start Server" button. Verify: curl http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}'. LM Studio auto-detects GPU and offloads layers automatically. GPU offload slider (right sidebar) controls how many model layers run on GPU — max for pure GPU, lower for CPU+GPU split when VRAM is tight. Time-to-first-response from zero: ~5 minutes including download of a 3B Q4_0 model. No CLI setup or Python environment needed.
Workload fit
Best for: non-developer users entering local AI who benefit from a visual interface, quick model evaluation and comparison with side-by-side chat tabs, Windows users who want a native installer without WSL2, local RAG prototyping with the built-in document loader, GPU tuning experimentation via visual sliders, teaching and workshop environments where GUI accessibility matters. Not suited for: production serving (use vLLM), headless server deployments without a display (use Ollama or llama.cpp), CI/CD pipelines requiring programmatic model management, fine-tuning workflows, maximum- throughput batch inference.
Alternatives
Use LM Studio for the best GUI experience in local LLM inference — model discovery with one-click download, visual chat interface, GPU tuning sliders, and server toggle. Switch to Ollama when you want a CLI-first, headless server experience with programmatic model management (ollama pull, ollama create) and systemd integration. Use llama.cpp when you need the raw inference engine with maximum configuration knobs — LM Studio builds on llama.cpp but hides most flags behind the UI. LM Studio is the best onboarding tool for non-developers entering local AI; Ollama is better for developers who live in the terminal. For production API serving, neither is appropriate — use vLLM. On Apple Silicon, MLX-LM benchmarks faster throughput but has no GUI and a narrower model ecosystem.
Troubleshooting + when to switch
Problem: Downloaded model doesn't appear in the model list. Fix: LM Studio indexes GGUF files from its default models directory. If you manually moved a GGUF file, use File → Import Model and navigate to the file. LM Studio only recognizes .gguf extension files. Problem: GPU offload causes crashes or corrupted output on Windows. Fix: Some Windows GPU drivers and CUDA versions cause memory corruption at high GPU offload. Reduce the GPU offload slider by 5–10 layers until stable. Update to the latest NVIDIA Game Ready or Studio driver. For AMD GPUs on Windows, LM Studio uses Vulkan via llama.cpp — ensure your AMD driver is the latest Adrenalin release. Problem: Local server returns model "auto" not found error. Fix: "auto" routes to the currently loaded model in the chat tab. Make sure a model is actively loaded in the chat UI before sending API requests.
Stack & relationships
How LM Studio relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.
Works with
- Works withAnythingLLM
OpenAI-compatible local server. Same setup pattern as Ollama; works without modification.
- Works withOpen WebUI
LM Studio's OAI-compatible server is a drop-in replacement for the Ollama backend in Open WebUI. Same wiring; different runtime.
Alternatives
- Alternative toOllama
Both are llama.cpp-based local model runners. LM Studio wins on GUI ergonomics; Ollama wins on CLI scriptability + curated model library. Pick by interface preference.
Featured in this workflow
Full-system workflows that include this tool as part of their service ledger — with the one-line operator note for each.
- Workflow · System·homelab·Role: Inference + chat frontendPrivate job-search assistant
One-click model loader with a polished chat UI on Windows / macOS / Linux. Hosts an OpenAI-compatible server on localhost so AnythingLLM can use the same model.
Pros
- Cleanest GUI in the space
- Built-in model browser
- Hardware compatibility checker
Cons
- Closed-source
- Heavier than Ollama for headless use
Compatibility
| Operating systems | macOS Linux Windows |
| GPU backends | NVIDIA CUDA AMD ROCm Apple Metal CPU |
| License | Closed source · free |
Runtime health
Operator-grade signals on how actively LM Studio is being maintained, how fresh its measurements are, and what failure classes operators have flagged. Every label below is anchored to a real date or count — we never infer maintainer activity we can't show.
Release cadence
Derived from the most recent editorial signal on this row.
8 days since last refresh · source: lastUpdated
Benchmark freshness
How recent the editorial measurements on this runtime are.
No editorial benchmarks for this runtime yet.
Community reproduction
Submissions that match an editorial measurement on similar hardware.
No community reproductions on file yet.
Ecosystem stability
Editorial rating from RunLocalAI — qualitative, not measured.
Get LM Studio
Frequently asked
Is LM Studio free?
What operating systems does LM Studio support?
Which GPUs work with LM Studio?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.
Related — keep moving
Verify LM Studio runs on your specific hardware before committing money.