11. Running Docker on Mac

Chapter 11 of 15 · 15 min

Docker Desktop for Mac does not have GPU passthrough to containers. This is not a bug—it is a fundamental limitation of the virtualization layer Docker Desktop uses on macOS. If you need GPU-accelerated inference inside a Docker container on a Mac, you need to use --platform linux/amd64 and accept that the inference will run on CPU only, or you need to run the container outside the VM entirely, which is complex.

Docker with Ollama:

# Pull and run Ollama in a container (CPU only—Metal not available)
docker run -d --name ollama \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama

# Attach to the container and run a model
docker exec -it ollama ollama run llama3.2:3b

This works but the inference runs on the Docker VM's CPU cores. Metal is not passed through, so GPU acceleration is absent. For a 3B model this is not terrible; for a 7B model you will notice the difference immediately.

The actual Metal passthrough situation for Docker on macOS: no native path exists. The Docker VM runs Linux in a hypervisor, and Metal is a macOS framework. They do not communicate.

For GPU-accelerated Docker workloads on Mac, the practical options are:

  1. Run Ollama or MLX directly on the host (Chapter 3), not in a container
  2. Use a Linux VM with GPU passthrough (requires compatible hardware and is beyond macOS support)
  3. Use container pre-building on a GPU machine and deploy the compiled artifacts to macOS for host-side inference

The most common failure is running a Docker container and getting GPU errors when the user expected Metal to be available. It is not. This is documented but easy to miss.

# Check if Docker sees any GPU devices
docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi
# On macOS this will fail: Error response from daemon: Unknown option --gpus
# Because GPU passthrough is not supported
EXERCISE

Run Ollama both directly on the host and inside a Docker container. Compare tokens per second for the same model on the same hardware. The host run should be 3–10× faster due to Metal availability.