Docker Deployment — Ollama — Installation to Mastery (Chapter 13)

Running Ollama in Docker isolates it from the host system and simplifies deployment. GPU passthrough requires the nvidia-container-toolkit on Linux or GPU support enabled in Docker Desktop on Windows.

Basic Docker Setup

# Pull and run the official image
docker run -d --name ollama -p 11434:11434 ollama/ollama

This starts Ollama in detached mode. Access the API at http://localhost:11434.

Persisting Models

Models downloaded inside the container are lost when the container is removed. Mount a volume to persist them:

docker run -d --name ollama \\
    -p 11434:11434 \\
    -v ollama-data:/root/.ollama \\
    ollama/ollama

The -v ollama-data:/root/.ollama flag maps the container's model storage to a Docker volume. Data survives container restarts and upgrades.

Running Specific Models

# Pull a model inside the container
docker exec ollama ollama pull llama3.2:1b

# Run the model
docker exec -it ollama ollama run llama3.2:1b

Or use the API directly:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "Hello"
}'

GPU Passthrough (NVIDIA)

Requires nvidia-container-toolkit installed on the host:

docker run -d --name ollama \\
    --gpus all \\
    -p 11434:11434 \\
    -v ollama-data:/root/.ollama \\
    ollama/ollama

The --gpus all flag passes through all NVIDIA GPUs. Verify with:

docker exec ollama nvidia-smi

GPU Passthrough (AMD)

AMD GPUs require ROCm-enabled images. Use the ollama/ollama:rocm tag:

docker run -d --name ollama \\
    --device /dev/kfd \\
    --device /dev/dri \\
    -p 11434:11434 \\
    -v ollama-data:/root/.ollama \\
    ollama/ollama:rocm

Resource Limits

Constrain CPU and memory usage:

docker run -d --name ollama \\
    --cpus="2" \\
    --memory="4g" \\
    -p 11434:11434 \\
    ollama/ollama

This limits the container to 2 CPU cores and 4 GB of RAM. Useful for shared environments.

Failure Modes

GPU not detected in container - Verify nvidia-smi works on the host, then check that the container uses the correct runtime (docker run --runtime=nvidia ...).
Port conflict - Another service on the host uses port 11434. Use -p 11435:11434 to map to a different port.
Volume permission issues - The ollama user inside the container may not have permissions on the host directory. Use named volumes (like ollama-data) instead of host paths.