Session Management — First Local Chatbot (Chapter 8)

Sessions let multiple users share the same server without seeing each other's conversations. FastAPI stores sessions in memory:

from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
import uuid

app = FastAPI()
sessions: dict[str, list[dict]] = {}

@app.get("/", response_class=HTMLResponse)
async def root():
    with open("app/templates/index.html") as f:
        return f.read()

@app.get("/session/{session_id}")
def get_session(session_id: str):
    return {"history": sessions.get(session_id, [])}

@app.post("/chat")
async def chat(session_id: str, model: str, messages: list[dict]):
    # Persist incoming user message
    if session_id not in sessions:
        sessions[session_id] = []
    sessions[session_id].extend(messages)

    def stream():
        from app.ollama_client import stream_chat
        for chunk in stream_chat(model, sessions[session_id]):
            yield chunk

    return StreamingResponse(stream(), media_type="text/event-stream")

@app.post("/sessions/{session_id}/clear")
def clear_session(session_id: str):
    sessions[session_id] = []
    return {"ok": True}

On the frontend, generate a session ID once and store it in localStorage:

let sessionId = localStorage.getItem("sessionId");
if (!sessionId) {
  sessionId = crypto.randomUUID();
  localStorage.setItem("sessionId", sessionId);
}

The session ID is sent as a URL path parameter or query string with every request.

A failure mode: in-memory sessions are lost on server restart. For a first chatbot this is acceptable. A production version would use Redis or a database.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.