RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI on Windows
  6. /Ch. 9
Local AI on Windows

09. LM Studio GUI Alternative

Chapter 9 of 15 · 15 min
KEY INSIGHT

LM Studio's local server mimics the OpenAI API format, which makes it a drop-in replacement for applications designed for the OpenAI API—but you must start the server manually before those applications will connect.

LM Studio provides a cross-platform GUI for running local LLMs. It has a Windows installer and does not require WSL2 or Docker. Download from lmstudio.ai. It bundles its own inference engine based on llama.cpp, supports GGUF model files, and has a built-in model browser (essentially a frontend to Hugging Face).

Install and launch LM Studio. The interface shows a model library on the left, a chat interface in the center, and server settings on the right. The "Developer" tab exposes a local inference server compatible with the OpenAI API format.

To start the local server in LM Studio:

  1. Go to the "Developer" tab
  2. Set the port (default: 1234)
  3. Click "Start Server"
  4. The server accepts requests at http://localhost:1234/v1/chat/completions

Test with curl:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    "messages": [{"role":"user","content":"What is 7 * 6?"}]
  }'

LM Studio's model files download to %LOCALAPPDATA%\LM Studio\models. If you already downloaded a model with Ollama, you cannot reuse those files directly—Ollama uses its own model format. However, you can download the same GGUF files from Hugging Face and place them in LM Studio's folder.

Memory management in LM Studio: the GUI has a slider for "Context size" and "GPU offload". Setting GPU offload to "Max" uses all VRAM for model weights. Setting it to partial offload distributes layers between GPU and RAM, which can be faster on systems where CPU-to-GPU transfers are a bottleneck.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Download a GGUF model through LM Studio's built-in browser, start the local server, confirm it responds to a curl request, then check Task Manager to see how much GPU memory the LM Studio process is using.

← Chapter 8
Managing Models with CLI
Chapter 10 →
Path Configuration