05. GPU Access in Docker
GPU access in Docker containers requires the NVIDIA Container Toolkit, formerly known as nvidia-docker. The toolkit provides a Docker runtime that injects GPU devices, drivers, and CUDA libraries into containers automatically when the container requests GPU access.
Installation involves adding the NVIDIA package repository, installing nvidia-container-toolkit, and configuring Docker to use nvidia runtime as the default. Runtime configuration happens in /etc/docker/daemon.json.
{
"runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
Docker compose deploy.resources.reservations.devices declares GPU requirements in YAML form. The declarative approach works across Docker Compose and Kubernetes with containerd or cri-dockerd runtimes.
GPU memory allocation requires careful management. Containers requesting GPU access inherit the host GPU driver's memory management. The nvidia-container-runtime exposes environment variables enumerating available devices, CUDA version, and driver capabilities.
# Verify GPU access from within a container
nvidia-smi
# Lists visible GPUs, memory usage, utilization
# Environment variables available to containers
echo $NVIDIA_VISIBLE_DEVICES # GPU device indices
echo $NVIDIA_DRIVER_CAPABILITIES # Available driver features
echo $_CUDA_VISIBLE_DEVICES # Deprecated alternative
CUDA version compatibility requires matching between the host driver, container base image, and application requirements. Containers requiring CUDA 12.1 cannot run on hosts with CUDA 11.8 drivers regardless of hardware capability. The CUDA compatibility matrix documents supported combinations.
Some inference frameworks require specific NVIDIA driver capabilities. PyTorch with CUDA requires compute and utility capabilities. TensorFlow requires different capability sets. The NVIDIA_DRIVER_CAPABILITIES environment variable in the container should include all required capabilities.
MIG (Multi-Instance GPU) partitioning splits physical GPUs into smaller logical instances. Each MIG instance operates as an independent device with dedicated memory and compute slices. Kubernetes supports MIG through device plugin configuration, but Docker Compose requires the nvidia-device-plugin for proper enumeration.
Verify GPU access in a Docker environment. Install nvidia-container-toolkit if not present, configure Docker with the nvidia runtime, and run a test container that executes nvidia-smi and a GPU-accelerated inference sample. Document the CUDA version, driver version, and available GPU memory.
# Test container with GPU access
docker run --rm --gpus all \
nvidia/cuda:12.1.0-base-ubuntu22.04 \
nvidia-smi
# Test PyTorch GPU access
docker run --rm --gpus all \
pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime \
python -c "
import torch
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'Device count: {torch.cuda.device_count()}')
if torch.cuda.is_available():
print(f'Device name: {torch.cuda.get_device_name(0)}')
print(f'Memory available: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')
"