RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI on macOS
  6. /Ch. 12
Local AI on macOS

12. LM Studio on macOS

Chapter 12 of 15 · 15 min
KEY INSIGHT

LM Studio provides the fastest GUI path to a running local AI server with real-time performance metrics visible during inference.

LM Studio provides a GUI for model management on macOS—it is the right tool when you prefer visual interfaces over CLI or when you need to compare model performance quickly without writing scripts.

Download from lmstudio.ai (direct download, no install via Homebrew). The app runs standalone and manages its own model storage.

Core operations in LM Studio:

  1. Model search: Built-in model browser from HuggingFace. Search for a model, click Download. LM Studio handles quantization selection and storage.
  2. Server: Click "Start Server" to launch a local API on port 1234 (configurable). Compatible with OpenAI API format for client code portability.
  3. Chat: Built-in chat interface for model interaction.
  4. Performance: Shows GPU utilization, tokens per second, and memory usage in real time during inference.

The server API in LM Studio is compatible with the OpenAI client library:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"  # required but ignored
)

response = client.chat.completions.create(
    model="lmstudio-community/Llama-3.2-3B-Instruct-GGUF",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the memory bandwidth of M3 Max?"}
    ],
    max_tokens=200
)
print(response.choices[0].message.content)

Real failure mode: LM Studio hangs on "Loading model" and never completes. This usually means the model file was downloaded incompletely or the model directory has permissions issues. Check ~/LM Studio for the downloaded files. Delete the partial download and re-download.

Another failure: Port 1234 is already in use. Stop the server from the LM Studio UI or kill the process: lsof -ti:1234 | xargs kill.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Download a 3B model via LM Studio, start the local server, and make an API call from Python using the OpenAI client library. Verify the response returns successfully.

← Chapter 11
Running Docker on Mac
Chapter 13 →
Open WebUI on macOS