09. Docker on Linux for AI
Docker on Linux is the only configuration where GPU containers run with native performance, no virtualization layer, and direct device access. The nvidia-container-toolkit replaces the older nvidia-docker2 package.
Install Docker:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
Install the NVIDIA container toolkit:
distribution=$(. /etc/os-release && echo "$ID$VERSION_ID")
curl -fsSL https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
Test GPU passthrough:
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 \
nvidia-smi
Test a full inference container:
docker run --rm --gpus all \
-v /path/to/models:/models \
ghcr.io/ggerganov/llama.cpp:server \
./server -m /models/mistral-7b-q4_k_m.gguf -ngl 99 -host 0.0.0.0
Failure mode: docker run --gpus all returns docker: Error response from daemon: could not select runtime: nvidia-container-runtime not found. The container runtime hook was not registered. Run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.
Failure mode: GPU device not found inside container. Check docker run --rm --gpus all ubuntu nvidia-smi works but a specific image fails. The failing image was built without CUDA base layers. Rebuild it with FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04.
Failure mode: Docker fails to start after installing the container toolkit. docker ps returns Cannot connect to the Docker daemon. The nvidia-container-runtime package may have installed conflicting dependencies. Check apt list --installed | grep nvidia and remove conflicting packages.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Install Docker and the NVIDIA container toolkit, verify GPU access inside a container with nvidia-smi, and run a llama.cpp server container with GPU offloading.