Troubleshooting macOS AI — Local AI on macOS (Chapter 14)

This chapter covers the most common failure modes in one place so you can diagnose fast.

Symptom: Model loads but runs at 1–3 tokens per second

Diagnosis: Check if Metal is active (Chapter 4). Check memory pressure: memory_pressure shows yellow or red. Check Activity Monitor GPU tab—likely under 5%. This combination means CPU fallback due to memory constraints or Metal not loaded.

Fix: Reduce context window size. Switch to a smaller quantization (Q5_K_M instead of Q8_0). Use a smaller model. Close other applications.

Symptom: "Metal not found" error in llama.cpp

Diagnosis: The binary was compiled without Metal support. This happens when you download a pre-built binary from GitHub releases that was not compiled on macOS with Metal flags.

Fix: Build llama.cpp from source with CMAKE_ARGS="-DGGML_METAL=ON" or use an Ollama model which bundles a Metal-compatible binary.

Symptom: Ollama runs in CLI but API returns connection refused

Diagnosis: Ollama CLI and Ollama server are separate processes. The CLI works without the server (it starts a subprocess). The API requires the server.

Fix: ollama serve in a dedicated terminal tab, then make API calls.

Symptom: Model file downloads successfully but fails to load

Diagnosis: Corrupted download, incomplete file, or wrong quantization for your hardware.

Fix:

# Verify file size
ls -lh ~/.ollama/models/blobs/*

# Remove and re-download
rm ~/.ollama/models/blobs/<problematic-hash>
ollama pull <model-name>

Symptom: MLX process killed with no error message

Diagnosis: OS killed the process due to out-of-memory. MLX does not produce a Python exception—it gets SIGKILL'd by the kernel.

Fix: Use a smaller model or reduce batch size. Check log show --predicate 'eventMessage contains "Killed"' --last 5m for kernel OOM logs.

Symptom: High CPU usage but model is not responding

Diagnosis: Model is in a swapping state—memory pressure forced it to disk.

Fix: Reduce model size or context. Check swapusage in Activity Monitor's Memory tab.