Docker can't see GPU — wire up the NVIDIA Container Toolkit
Docker doesn't expose the host GPU by default. The NVIDIA Container Toolkit is the bridge. Here's the install + the runtime config + the four common symptoms that mean it's misconfigured.
Diagnostic order — most likely first
NVIDIA Container Toolkit not installed
`docker run --rm --gpus all nvidia/cuda:12.4.0-base nvidia-smi` returns 'could not select device driver "" with capabilities: [[gpu]]'.
Install: `sudo apt install nvidia-container-toolkit` then `sudo nvidia-ctk runtime configure --runtime=docker` then `sudo systemctl restart docker`. Re-run the test command.
Container has the toolkit but no `--gpus all` flag passed
`docker exec` into the container, run `nvidia-smi` — fails. Outside the container the GPU is visible.
Pass `--gpus all` to `docker run`. For docker-compose: add `deploy.resources.reservations.devices: [{ driver: nvidia, count: all, capabilities: [gpu] }]`.
Wrong base image (CUDA version mismatch)
Container starts but PyTorch / TensorFlow inside fails with 'CUDA driver version is insufficient' or kernel module errors.
Match the base image's CUDA version to your host driver. CUDA 12.4 image needs driver ≥ 550. Use NVIDIA's official `nvidia/cuda:<version>-runtime-ubuntu22.04` images — the suffix maps to a known driver requirement.
Docker Desktop on Windows without WSL Integration enabled
WSL terminal sees the GPU. Docker Desktop containers don't.
Docker Desktop → Settings → Resources → WSL Integration → enable for your distro. Restart Docker Desktop. Also ensure WSL2 GPU passthrough works first (see /troubleshooting/wsl-gpu-not-detected).
Kubernetes / k3s missing the device plugin
Single-host Docker works fine. Kubernetes pods can't request GPU.
Install the `nvidia-device-plugin` DaemonSet: `kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/...`. Then declare resources: `nvidia.com/gpu: 1` in pod specs.
Frequently asked questions
Do containers add inference overhead vs running on the host?
Negligible. The NVIDIA Container Toolkit shares the host's GPU directly via cgroup pinning and device passthrough — there's no virtualization layer. Inference performance is within 1% of host-native.
Can I share a GPU across multiple containers?
Yes by default — multiple containers with `--gpus all` see the same card. For isolation, use NVIDIA MPS (Multi-Process Service) or MIG (Multi-Instance GPU on A100/H100) to partition. Most local-AI workflows don't need this.
Why use Docker for local AI at all?
Reproducible runtime environments, isolation from host driver chaos, and easy switching between CUDA versions for different projects. Trade-off: complexity. For solo workflows, a host-native conda env is often simpler.
Related troubleshooting
WSL2 doesn't pass the GPU through unless the host driver is right and the kernel is current. Here's the install order that actually works in 2026, and how to confirm passthrough is live before you waste an afternoon.
Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).
PyTorch falsely reporting no CUDA is the most common Python ML setup failure. The cause is almost always: wrong PyTorch wheel for your CUDA version, or a CPU-only build accidentally installed.
When the fix is hardware
A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time: