How to run Ollama in Docker
Docker installed, NVIDIA Container Toolkit (for GPU)
What this does
Runs the Ollama server inside a Docker container, enabling isolated model execution with optional GPU passthrough. Models persist in a named volume and survive container restarts.
Steps
Create a named volume for model persistence.
docker volume create ollama-modelsExpected output:
ollama-modelslisted indocker volume ls.Start the Ollama container with GPU access and port mapping.
docker run -d \ --gpus all \ -v ollama-models:/root/.ollama \ -p 11434:11434 \ --name ollama \ ollama/ollama:latestExpected output: Container ID returned, visible in
docker pswith status "Up".Pull a model into the container.
docker exec ollama ollama pull llama3.2Expected output: Progress indicator followed by "success" confirmation.
Send a completion request to the running server.
curl -X POST http://localhost:11434/api/generate \ -d '{"model":"llama3.2","prompt":"Why is the sky blue?","stream":false}'Expected output: JSON object containing the generated response.
Verification
curl -s http://localhost:11434/api/tags
# Expected: list of model names available in the container
Common failures
- Container exits immediately with code 125 — Docker cannot access the GPU. Reinstall NVIDIA Container Toolkit.
- Port 11434 already bound — A local Ollama service occupies the port. Stop it with
systemctl stop ollamaor map to a different host port. - Model pull fails with network error — Check DNS configuration inside the container with
docker exec ollama ping -c 1 8.8.8.8. - "model not found" on generation request — Use the exact name from the
api/tagslist.
Operator checkpoint
Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.