04. Streaming Responses
To stream from FastAPI to the browser, use StreamingResponse with an async generator. Add to app/main.py:
from fastapi.responses import StreamingResponse
@app.post("/chat")
async def chat(model: str, messages: list[dict]):
from app.ollama_client import stream_chat
return StreamingResponse(
stream_chat(model, messages),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache"}
)
The media_type="text/event-stream" header tells the browser this is SSE. Each yielded line must be prefixed with data: for the browser's EventSource to parse it. Fix stream_chat in app/ollama_client.py:
def stream_chat(model: str, messages: list[dict]):
payload = {
"model": model,
"messages": messages,
"stream": True,
}
with httpx.stream("POST", f"{OLLAMA_BASE}/api/chat", json=payload, timeout=120.0) as resp:
resp.raise_for_status()
for line in resp.iter_lines():
if line:
yield f"data: {line}\n\n"
The double newline \n\n is the SSE message delimiter. Missing it causes the browser to buffer indefinitely.
A failure mode: if the client disconnects (user closes the tab) while FastAPI is streaming, the async generator raises CancelledError. Catch it in the route with a try/except or let the framework handle it—FastAPI handles CancelledError silently by default, which is fine.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Use curl -N http://localhost:8000/chat with a POST body to test the stream manually. Watch the chunks arrive.