08. Session Management
Sessions let multiple users share the same server without seeing each other's conversations. FastAPI stores sessions in memory:
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
import uuid
app = FastAPI()
sessions: dict[str, list[dict]] = {}
@app.get("/", response_class=HTMLResponse)
async def root():
with open("app/templates/index.html") as f:
return f.read()
@app.get("/session/{session_id}")
def get_session(session_id: str):
return {"history": sessions.get(session_id, [])}
@app.post("/chat")
async def chat(session_id: str, model: str, messages: list[dict]):
# Persist incoming user message
if session_id not in sessions:
sessions[session_id] = []
sessions[session_id].extend(messages)
def stream():
from app.ollama_client import stream_chat
for chunk in stream_chat(model, sessions[session_id]):
yield chunk
return StreamingResponse(stream(), media_type="text/event-stream")
@app.post("/sessions/{session_id}/clear")
def clear_session(session_id: str):
sessions[session_id] = []
return {"ok": True}
On the frontend, generate a session ID once and store it in localStorage:
let sessionId = localStorage.getItem("sessionId");
if (!sessionId) {
sessionId = crypto.randomUUID();
localStorage.setItem("sessionId", sessionId);
}
The session ID is sent as a URL path parameter or query string with every request.
A failure mode: in-memory sessions are lost on server restart. For a first chatbot this is acceptable. A production version would use Redis or a database.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Add a "New Chat" button that generates a new session ID, clears localStorage, and reloads the page.