LM Studio GUI Alternative — Local AI on Windows (Chapter 9)

LM Studio provides a cross-platform GUI for running local LLMs. It has a Windows installer and does not require WSL2 or Docker. Download from lmstudio.ai. It bundles its own inference engine based on llama.cpp, supports GGUF model files, and has a built-in model browser (essentially a frontend to Hugging Face).

Install and launch LM Studio. The interface shows a model library on the left, a chat interface in the center, and server settings on the right. The "Developer" tab exposes a local inference server compatible with the OpenAI API format.

To start the local server in LM Studio:

Go to the "Developer" tab
Set the port (default: 1234)
Click "Start Server"
The server accepts requests at http://localhost:1234/v1/chat/completions

Test with curl:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    "messages": [{"role":"user","content":"What is 7 * 6?"}]
  }'

LM Studio's model files download to %LOCALAPPDATA%\LM Studio\models. If you already downloaded a model with Ollama, you cannot reuse those files directly—Ollama uses its own model format. However, you can download the same GGUF files from Hugging Face and place them in LM Studio's folder.

Memory management in LM Studio: the GUI has a slider for "Context size" and "GPU offload". Setting GPU offload to "Max" uses all VRAM for model weights. Setting it to partial offload distributes layers between GPU and RAM, which can be faster on systems where CPU-to-GPU transfers are a bottleneck.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.