15. Chatbot Project Wrap-up
The complete project structure:
chatbot/
├── Dockerfile
├── docker-compose.yml
├── app/
│ ├── main.py # FastAPI app, routes
│ ├── ollama_client.py # Ollama HTTP client
│ └── templates/
│ └── index.html # Single-file frontend
└── venv/ # Python virtual environment
What works: Streaming responses, multi-turn conversation memory, session management, personality configuration, multi-model selection, error handling, UI polish, LAN deployment, Docker deployment.
What is not production-ready: Session storage is in-memory (lost on restart), no authentication, no rate limiting, no message size limits, no prompt injection protection.
Next steps:
- Swap in-memory sessions for Redis or SQLite with
session_idas the key - Add HTTP Basic Auth with a middleware
- Add message size validation (reject prompts > 8192 tokens)
- Add prompt injection detection (check for repeated system-prompt override patterns)
- Add streaming metrics: tokens per second display
To verify the complete system works end-to-end, run:
# Terminal 1
ollama serve
# Terminal 2
cd chatbot
source venv/bin/activate
uvicorn app.main:app --reload
# Browser
# Navigate to http://localhost:8000
# Select a model, type a message, verify streaming response
The chatbot is complete. Every file is a plain text file you can read, edit, and understand. The architecture is simple enough to explain in two minutes and reliable enough to ship.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Delete the conversation history from sessions dict in main.py (restart the server). Confirm the UI clears. Then write a sessions.json persistence layer that saves and loads sessions from disk so they survive restarts.