GPU Access in Docker — Production Local AI Deployment (Chapter 5)

GPU access in Docker containers requires the NVIDIA Container Toolkit, formerly known as nvidia-docker. The toolkit provides a Docker runtime that injects GPU devices, drivers, and CUDA libraries into containers automatically when the container requests GPU access.

Installation involves adding the NVIDIA package repository, installing nvidia-container-toolkit, and configuring Docker to use nvidia runtime as the default. Runtime configuration happens in /etc/docker/daemon.json.

{
  "runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Docker compose deploy.resources.reservations.devices declares GPU requirements in YAML form. The declarative approach works across Docker Compose and Kubernetes with containerd or cri-dockerd runtimes.

GPU memory allocation requires careful management. Containers requesting GPU access inherit the host GPU driver's memory management. The nvidia-container-runtime exposes environment variables enumerating available devices, CUDA version, and driver capabilities.

# Verify GPU access from within a container
nvidia-smi
# Lists visible GPUs, memory usage, utilization

# Environment variables available to containers
echo $NVIDIA_VISIBLE_DEVICES   # GPU device indices
echo $NVIDIA_DRIVER_CAPABILITIES  # Available driver features
echo $_CUDA_VISIBLE_DEVICES   # Deprecated alternative

CUDA version compatibility requires matching between the host driver, container base image, and application requirements. Containers requiring CUDA 12.1 cannot run on hosts with CUDA 11.8 drivers regardless of hardware capability. The CUDA compatibility matrix documents supported combinations.

Some inference frameworks require specific NVIDIA driver capabilities. PyTorch with CUDA requires compute and utility capabilities. TensorFlow requires different capability sets. The NVIDIA_DRIVER_CAPABILITIES environment variable in the container should include all required capabilities.

MIG (Multi-Instance GPU) partitioning splits physical GPUs into smaller logical instances. Each MIG instance operates as an independent device with dedicated memory and compute slices. Kubernetes supports MIG through device plugin configuration, but Docker Compose requires the nvidia-device-plugin for proper enumeration.

# Test container with GPU access docker run --rm --gpus all \ nvidia/cuda:12.1.0-base-ubuntu22.04 \ nvidia-smi # Test PyTorch GPU access docker run --rm --gpus all \ pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime \ python -c " import torch print(f'CUDA available: {torch.cuda.is_available()}') print(f'Device count: {torch.cuda.device_count()}') if torch.cuda.is_available(): print(f'Device name: {torch.cuda.get_device_name(0)}') print(f'Memory available: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB') "