Docker Deployment — First Local Chatbot (Chapter 14)

Create Dockerfile:

FROM python:3.11-slim

WORKDIR /app
RUN pip install --no-cache-dir fastapi uvicorn httpx python-multipart

COPY app/ ./app/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Create docker-compose.yml:

version: "3.8"
services:
  chatbot:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ./app:/app/app
    environment:
      - OLLAMA_BASE=http://host.docker.internal:11434
    network_mode: host  # needed for host.docker.internal on Linux

The key trick is host.docker.internal, which lets the container reach the host's Ollama. On Linux, host.docker.internal requires network_mode: host. On macOS and Windows, Docker Desktop handles it automatically.

Build and run:

docker build -t chatbot .
docker run -p 8000:8000 chatbot

A failure mode: on Linux, host.docker.internal is not in /etc/hosts by default. Add it in the Dockerfile:

RUN echo "172.17.0.1 host.docker.internal" >> /etc/hosts

Find the correct gateway IP with ip route | grep docker on the host.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.