15. macOS AI Workflows
This chapter chains the tools from previous chapters into working workflows. Each workflow solves a real use case.
Workflow 1: Development API with MLX acceleration
Purpose: Serve a local model as an API for a development project, maximizing throughput on Apple Silicon.
# 1. Start Ollama on the host
ollama serve &
# 2. Use LM Studio or a Python server for different model formats
# MLX model via Python server
pip install fastapi uvicorn
python3 << 'EOF'
from fastapi import FastAPI
from mlx_lm import load
app = FastAPI()
model, tokenizer = None, None
@app.on_event("startup")
async def startup():
global model, tokenizer
model, tokenizer = load("mlx-community/Qwen2.5-3B-Instruct-4bit")
@app.post("/generate")
async def generate(req: dict):
response = model.generate(
req["prompt"],
tokenizer,
max_tokens=req.get("max_tokens", 256)
)
return {"response": response}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
EOF
Workflow 2: Team model serving with Open WebUI
Purpose: Give a team a self-hosted web interface to multiple models.
# 1. Start Ollama (host)
ollama serve
# 2. In another terminal, start Open WebUI (Python)
open-webui serve
# 3. Team members access http://localhost:8080
# Multiple users can chat simultaneously with different models
Workflow 3: Batch processing with CLI scripts
Purpose: Run inference over a dataset without a web interface.
#!/bin/bash
# batch_inference.sh
MODEL="llama3.2:3b"
INPUT_FILE="prompts.txt"
OUTPUT_FILE="results.txt"
while IFS= read -r prompt; do
result=$(curl -s -X POST http://localhost:11434/api/generate \
-d "{\"model\":\"$MODEL\",\"prompt\":\"$prompt\",\"stream\":false}" \
| jq -r '.response')
echo "$result" >> "$OUTPUT_FILE"
echo "Processed: ${prompt:0:50}..." >&2
done < "$INPUT_FILE"
# Run the batch script
chmod +x batch_inference.sh
./batch_inference.sh
Workflow 4: Claude Code agent with local model fallback
Purpose: Have a coding assistant use a local model as a fallback when cloud models are unavailable.
Configure in your AI tool's settings:
{
"model": "claude-3-5-sonnet",
"fallback_model": "ollama/llama3.2:3b",
"ollama_endpoint": "http://localhost:11434"
}
This pattern is common in dev tools—define the local endpoint as a fallback and the tool automatically routes to it when the primary endpoint is unreachable.
Build one workflow from this chapter end-to-end. Start Ollama, launch Open WebUI, make an API call, and capture the results with a batch script. The full chain should take under 20 minutes to assemble and validate.