What this does

Runs the Ollama server inside a Docker container, enabling isolated model execution with optional GPU passthrough. Models persist in a named volume and survive container restarts.

Steps

Create a named volume for model persistence.
```
docker volume create ollama-models
```
Expected output: ollama-models listed in docker volume ls.
Start the Ollama container with GPU access and port mapping.
```
docker run -d \
  --gpus all \
  -v ollama-models:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:latest
```
Expected output: Container ID returned, visible in docker ps with status "Up".
Pull a model into the container.
```
docker exec ollama ollama pull llama3.2
```
Expected output: Progress indicator followed by "success" confirmation.

Send a completion request to the running server.

curl -X POST http://localhost:11434/api/generate \
  -d '{"model":"llama3.2","prompt":"Why is the sky blue?","stream":false}'

Expected output: JSON object containing the generated response.

Verification

curl -s http://localhost:11434/api/tags
# Expected: list of model names available in the container

Common failures

Container exits immediately with code 125 — Docker cannot access the GPU. Reinstall NVIDIA Container Toolkit.
Port 11434 already bound — A local Ollama service occupies the port. Stop it with systemctl stop ollama or map to a different host port.
Model pull fails with network error — Check DNS configuration inside the container with docker exec ollama ping -c 1 8.8.8.8.
"model not found" on generation request — Use the exact name from the api/tags list.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to run Ollama in Docker

What this does

Steps

Verification

Common failures

Operator checkpoint

Related guides