RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI on macOS
  6. /Ch. 14
Local AI on macOS

14. Troubleshooting macOS AI

Chapter 14 of 15 · 15 min
KEY INSIGHT

Most macOS AI failures trace to three causes: Metal not active, memory exhausted, or the server process not running—check those three before anything else.

This chapter covers the most common failure modes in one place so you can diagnose fast.

Symptom: Model loads but runs at 1–3 tokens per second

Diagnosis: Check if Metal is active (Chapter 4). Check memory pressure: memory_pressure shows yellow or red. Check Activity Monitor GPU tab—likely under 5%. This combination means CPU fallback due to memory constraints or Metal not loaded.

Fix: Reduce context window size. Switch to a smaller quantization (Q5_K_M instead of Q8_0). Use a smaller model. Close other applications.

Symptom: "Metal not found" error in llama.cpp

Diagnosis: The binary was compiled without Metal support. This happens when you download a pre-built binary from GitHub releases that was not compiled on macOS with Metal flags.

Fix: Build llama.cpp from source with CMAKE_ARGS="-DGGML_METAL=ON" or use an Ollama model which bundles a Metal-compatible binary.

Symptom: Ollama runs in CLI but API returns connection refused

Diagnosis: Ollama CLI and Ollama server are separate processes. The CLI works without the server (it starts a subprocess). The API requires the server.

Fix: ollama serve in a dedicated terminal tab, then make API calls.

Symptom: Model file downloads successfully but fails to load

Diagnosis: Corrupted download, incomplete file, or wrong quantization for your hardware.

Fix:

# Verify file size
ls -lh ~/.ollama/models/blobs/*

# Remove and re-download
rm ~/.ollama/models/blobs/<problematic-hash>
ollama pull <model-name>

Symptom: MLX process killed with no error message

Diagnosis: OS killed the process due to out-of-memory. MLX does not produce a Python exception—it gets SIGKILL'd by the kernel.

Fix: Use a smaller model or reduce batch size. Check log show --predicate 'eventMessage contains "Killed"' --last 5m for kernel OOM logs.

Symptom: High CPU usage but model is not responding

Diagnosis: Model is in a swapping state—memory pressure forced it to disk.

Fix: Reduce model size or context. Check swapusage in Activity Monitor's Memory tab.

EXERCISE

Introduce each failure mode intentionally: run a large model with a tiny context window (fails with memory), call the API before starting Ollama server (fails with connection refused), check GPU utilization for a model without Metal support. Knowing what failure looks like lets you diagnose it in seconds.

← Chapter 13
Open WebUI on macOS
Chapter 15 →
macOS AI Workflows