COURSE · FND · B010

First Local Chatbot

Learn first local chatbot through RunLocalAI's practical lens: chatbot, fastapi, html and javascript, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

15 chapters6hFoundations trackBy Fredoline Eruo
PREREQUISITES
  • B001
  • B003
  • B002

Course B010: First Local Chatbot

Why this course exists

Most chatbot tutorials skip the hard parts. They show you a demo that works on the author's machine and breaks the moment you change a port. This course builds a real, production-ready local chatbot from scratch: a FastAPI backend that talks to Ollama, an HTML/JS frontend with real streaming, conversation memory, session management, personality configuration, and two deployment paths (LAN and Docker). Every failure mode you will hit is documented because you will hit it.

What you will know after

  • Build a FastAPI backend that streams Ollama responses in real time via Server-Sent Events
  • Write an HTML/JS frontend that handles streaming output, conversation history, and session switching
  • Manage multi-turn conversation context by storing and retrieving message history
  • Configure personality parameters (system prompt, temperature, top_p) and switch between Ollama models at runtime
  • Deploy the chatbot on a LAN or inside a Docker container with working networking
  • Handle the failure modes: Ollama timeout, invalid JSON from the frontend, CORS errors, SSE disconnects, and session storage exhaustion
CHAPTERS
  1. 01Chatbot ArchitectureEvery component (browser, FastAPI, Ollama) speaks a different protocol; the backend is the translator between all of them.15 min
  2. 02FastAPI Backend SetupFastAPI's dependency injection and auto-generated OpenAPI docs make backend prototyping fast, but the actual streaming logic requires bypassing FastAPI's response model.15 min
  3. 03Ollama IntegrationOllama's API is just HTTP JSON over SSE. You do not need an SDK; `httpx` is sufficient for all operations.15 min
  4. 04Streaming ResponsesStreaming requires both the Ollama request and the FastAPI response to use streaming mode simultaneously; half-streaming (buffering one side) causes a dead end.15 min
  5. 05HTML Frontend BasicsThe frontend is just HTML served by FastAPI. No CDN, no JS bundler, no build pipeline—the browser loads it directly.15 min
  6. 06JavaScript EventSource`fetch` + `ReadableStream` gives you full control over streaming with POST requests, which `EventSource` cannot do.15 min
  7. 07Conversation MemorySending full history is the only way to maintain context, but you must truncate or the model crashes with a context overflow error.15 min
  8. 08Session ManagementSession storage decouples the browser (stateless UI) from the conversation state (server-side history).15 min
  9. 09Personality ConfigurationThe system prompt is the single most effective lever for controlling chatbot behavior—more than temperature or model selection.20 min
  10. 10Multiple Model SelectionModel switching requires zero backend changes—Ollama handles multiple models from one endpoint; the backend just passes the model name through.15 min
  11. 11Error HandlingHandle errors at every layer: Ollama API errors, HTTP errors, SSE parsing errors, and session errors. Silent failures are the hardest to debug.20 min
  12. 12UI PolishA chatbot that is technically correct but visually broken feels broken to users. The typing indicator and auto-scroll take 15 minutes and double the perceived quality.20 min
  13. 13LAN DeploymentBinding to `0.0.0.0` exposes the service on all network interfaces; `127.0.0.1` limits it to the local machine only.15 min
  14. 14Docker DeploymentDocker containers cannot access host services by default; `host.docker.internal` bridges the gap but requires platform-specific configuration.20 min
  15. 15Chatbot Project Wrap-upThe whole chatbot is under 300 lines of code. Complexity is a choice, not a requirement—start simple and add only what you need.15 min