03. Ollama on macOS
Ollama is the fastest path from zero to running local AI on macOS. Install it and have a model answering queries in under 10 minutes. The trade-off is that it abstracts away most of the hardware details, which makes debugging harder when something goes wrong.
Install via Homebrew or the standalone binary:
brew install ollama
# or
curl -fsSL https://ollama.com/install.sh | sh
After installation, Ollama runs as a background service. The CLI communicates with it:
ollama pull llama3.2:3b # pulls ~1.9 GB, returns to prompt when done
ollama run llama3.2:3b # starts interactive session
ollama serve # starts the API server (runs in foreground)
Pull a model, run it, done. But here is where the abstraction leaks:
# Check if ollama is actually running
curl http://localhost:11434/api/tags
# Should return JSON list of available models
If you see a connection refused error, Ollama is not running. Start it with ollama serve in one terminal tab.
The API server is what makes Ollama useful beyond the CLI:
# Send a completion request
curl -X POST http://localhost:11434/api/generate \
-d '{"model":"llama3.2:3b","prompt":"Why is unified memory faster for AI?","stream":false}'
Real failure mode: After a macOS update, Ollama may fail to start because the update invalidated the helper binary signature. You get Error: abnormal exit in the logs. Fix: brew services restart ollama or uninstall and reinstall.
Another common failure: ollama run llama3.2:3b works but curl to the API fails with connection refused. This means Ollama CLI is running the model but the API server is not. The CLI and the server are separate processes. Run ollama serve explicitly to start the server.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Install Ollama, pull llama3.2:1b, run it via CLI, then make one API call using curl. Verify the server responds with valid JSON.