14. Docker Deployment
Create Dockerfile:
FROM python:3.11-slim
WORKDIR /app
RUN pip install --no-cache-dir fastapi uvicorn httpx python-multipart
COPY app/ ./app/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Create docker-compose.yml:
version: "3.8"
services:
chatbot:
build: .
ports:
- "8000:8000"
volumes:
- ./app:/app/app
environment:
- OLLAMA_BASE=http://host.docker.internal:11434
network_mode: host # needed for host.docker.internal on Linux
The key trick is host.docker.internal, which lets the container reach the host's Ollama. On Linux, host.docker.internal requires network_mode: host. On macOS and Windows, Docker Desktop handles it automatically.
Build and run:
docker build -t chatbot .
docker run -p 8000:8000 chatbot
A failure mode: on Linux, host.docker.internal is not in /etc/hosts by default. Add it in the Dockerfile:
RUN echo "172.17.0.1 host.docker.internal" >> /etc/hosts
Find the correct gateway IP with ip route | grep docker on the host.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Build the Docker image, start the container, and access the chatbot at http://localhost:8000. Verify streaming works from inside the container.