03. Ollama Integration
Add a client module for Ollama. Create app/ollama_client.py:
import httpx
OLLAMA_BASE = "http://localhost:11434"
def list_models() -> list[dict]:
response = httpx.get(f"{OLLAMA_BASE}/api/tags", timeout=5.0)
response.raise_for_status()
data = response.json()
return [m["name"] for m in data.get("models", [])]
def stream_chat(model: str, messages: list[dict]):
"""Yield raw SSE lines from Ollama."""
payload = {
"model": model,
"messages": messages,
"stream": True,
}
with httpx.stream("POST", f"{OLLAMA_BASE}/api/chat", json=payload, timeout=60.0) as resp:
resp.raise_for_status()
for line in resp.iter_lines():
if line:
yield line + "\n"
Add the route to app/main.py:
from app.ollama_client import list_models
@app.get("/models")
def get_models():
return {"models": list_models()}
Test it. If list_models() raises httpx.ConnectError, Ollama is not running or the URL is wrong. Verify with curl http://localhost:11434/api/tags. If the error says model not found, that means Ollama is running but the requested model is not pulled—run ollama pull llama3 first.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Write a test script test_ollama.py that calls list_models() and prints the model names. Run it with python test_ollama.py.