Chatbot Project Wrap-up — First Local Chatbot (Chapter 15)

The complete project structure:

chatbot/
├── Dockerfile
├── docker-compose.yml
├── app/
│   ├── main.py           # FastAPI app, routes
│   ├── ollama_client.py  # Ollama HTTP client
│   └── templates/
│       └── index.html    # Single-file frontend
└── venv/                 # Python virtual environment

What works: Streaming responses, multi-turn conversation memory, session management, personality configuration, multi-model selection, error handling, UI polish, LAN deployment, Docker deployment.

What is not production-ready: Session storage is in-memory (lost on restart), no authentication, no rate limiting, no message size limits, no prompt injection protection.

Next steps:

Swap in-memory sessions for Redis or SQLite with session_id as the key
Add HTTP Basic Auth with a middleware
Add message size validation (reject prompts > 8192 tokens)
Add prompt injection detection (check for repeated system-prompt override patterns)
Add streaming metrics: tokens per second display

To verify the complete system works end-to-end, run:

# Terminal 1
ollama serve

# Terminal 2
cd chatbot
source venv/bin/activate
uvicorn app.main:app --reload

# Browser
# Navigate to http://localhost:8000
# Select a model, type a message, verify streaming response

The chatbot is complete. Every file is a plain text file you can read, edit, and understand. The architecture is simple enough to explain in two minutes and reliable enough to ship.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.