Ollama Integration — First Local Chatbot (Chapter 3)

Add a client module for Ollama. Create app/ollama_client.py:

import httpx

OLLAMA_BASE = "http://localhost:11434"

def list_models() -> list[dict]:
    response = httpx.get(f"{OLLAMA_BASE}/api/tags", timeout=5.0)
    response.raise_for_status()
    data = response.json()
    return [m["name"] for m in data.get("models", [])]

def stream_chat(model: str, messages: list[dict]):
    """Yield raw SSE lines from Ollama."""
    payload = {
        "model": model,
        "messages": messages,
        "stream": True,
    }
    with httpx.stream("POST", f"{OLLAMA_BASE}/api/chat", json=payload, timeout=60.0) as resp:
        resp.raise_for_status()
        for line in resp.iter_lines():
            if line:
                yield line + "\n"

Add the route to app/main.py:

from app.ollama_client import list_models

@app.get("/models")
def get_models():
    return {"models": list_models()}

Test it. If list_models() raises httpx.ConnectError, Ollama is not running or the URL is wrong. Verify with curl http://localhost:11434/api/tags. If the error says model not found, that means Ollama is running but the requested model is not pulled—run ollama pull llama3 first.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.