Chatbot Architecture — First Local Chatbot (Chapter 1)

A local chatbot has three moving parts: the frontend (HTML + JavaScript in a browser), the backend (FastAPI on a server), and the LLM engine (Ollama running a model). The browser never talks to Ollama directly. Every request flows:

Browser → FastAPI → Ollama → FastAPI → Browser

This separation lets you swap models, add auth, and control rate limiting in one place without touching the frontend.

The frontend is stateless for all data except session identity. It sends a session ID with every request so FastAPI knows which conversation history to retrieve. FastAPI stores sessions in memory as a Python dict: sessions[session_id] = [{"role": "user", "content": "..."}]. In production you would swap this for Redis, but for a first chatbot the dict is fine.

The Ollama API accepts a POST to /api/chat with a JSON body. The stream: true flag returns newline-delimited JSON chunks. Each chunk has the shape {"model": "...", "message": {"role": "...", "content": "..."}, "done": false}. FastAPI proxies this to the browser using Server-Sent Events (SSE), which is a unidirectional stream format the browser reads with EventSource. SSE was chosen over WebSocket because it requires zero additional libraries in the browser and handles reconnection automatically.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.