RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Production Local AI Deployment
  6. /Ch. 5
Production Local AI Deployment

05. GPU Access in Docker

Chapter 5 of 24 · 20 min
KEY INSIGHT

GPU access requires matched CUDA versions between host drivers and container runtimes, declared through Docker runtime configuration and resource requests.

GPU access in Docker containers requires the NVIDIA Container Toolkit, formerly known as nvidia-docker. The toolkit provides a Docker runtime that injects GPU devices, drivers, and CUDA libraries into containers automatically when the container requests GPU access.

Installation involves adding the NVIDIA package repository, installing nvidia-container-toolkit, and configuring Docker to use nvidia runtime as the default. Runtime configuration happens in /etc/docker/daemon.json.

{
  "runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Docker compose deploy.resources.reservations.devices declares GPU requirements in YAML form. The declarative approach works across Docker Compose and Kubernetes with containerd or cri-dockerd runtimes.

GPU memory allocation requires careful management. Containers requesting GPU access inherit the host GPU driver's memory management. The nvidia-container-runtime exposes environment variables enumerating available devices, CUDA version, and driver capabilities.

# Verify GPU access from within a container
nvidia-smi
# Lists visible GPUs, memory usage, utilization

# Environment variables available to containers
echo $NVIDIA_VISIBLE_DEVICES   # GPU device indices
echo $NVIDIA_DRIVER_CAPABILITIES  # Available driver features
echo $_CUDA_VISIBLE_DEVICES   # Deprecated alternative

CUDA version compatibility requires matching between the host driver, container base image, and application requirements. Containers requiring CUDA 12.1 cannot run on hosts with CUDA 11.8 drivers regardless of hardware capability. The CUDA compatibility matrix documents supported combinations.

Some inference frameworks require specific NVIDIA driver capabilities. PyTorch with CUDA requires compute and utility capabilities. TensorFlow requires different capability sets. The NVIDIA_DRIVER_CAPABILITIES environment variable in the container should include all required capabilities.

MIG (Multi-Instance GPU) partitioning splits physical GPUs into smaller logical instances. Each MIG instance operates as an independent device with dedicated memory and compute slices. Kubernetes supports MIG through device plugin configuration, but Docker Compose requires the nvidia-device-plugin for proper enumeration.

EXERCISE

Verify GPU access in a Docker environment. Install nvidia-container-toolkit if not present, configure Docker with the nvidia runtime, and run a test container that executes nvidia-smi and a GPU-accelerated inference sample. Document the CUDA version, driver version, and available GPU memory.

# Test container with GPU access
docker run --rm --gpus all \
  nvidia/cuda:12.1.0-base-ubuntu22.04 \
  nvidia-smi

# Test PyTorch GPU access
docker run --rm --gpus all \
  pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime \
  python -c "
import torch
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'Device count: {torch.cuda.device_count()}')
if torch.cuda.is_available():
    print(f'Device name: {torch.cuda.get_device_name(0)}')
    print(f'Memory available: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')
"
← Chapter 4
Docker Compose for AI Stack
Chapter 6 →
Resource Limits