RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI on Linux
  6. /Ch. 9
Local AI on Linux

09. Docker on Linux for AI

Chapter 9 of 15 · 20 min
KEY INSIGHT

Docker with `nvidia-container-toolkit` exposes GPU devices inside containers by mounting `/dev/nvidia*` and injecting the NVIDIA libraries through the OCI hooks system—no emulation, no performance loss.

Docker on Linux is the only configuration where GPU containers run with native performance, no virtualization layer, and direct device access. The nvidia-container-toolkit replaces the older nvidia-docker2 package.

Install Docker:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

Install the NVIDIA container toolkit:

distribution=$(. /etc/os-release && echo "$ID$VERSION_ID")
curl -fsSL https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

Test GPU passthrough:

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 \
  nvidia-smi

Test a full inference container:

docker run --rm --gpus all \
  -v /path/to/models:/models \
  ghcr.io/ggerganov/llama.cpp:server \
  ./server -m /models/mistral-7b-q4_k_m.gguf -ngl 99 -host 0.0.0.0

Failure mode: docker run --gpus all returns docker: Error response from daemon: could not select runtime: nvidia-container-runtime not found. The container runtime hook was not registered. Run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker.

Failure mode: GPU device not found inside container. Check docker run --rm --gpus all ubuntu nvidia-smi works but a specific image fails. The failing image was built without CUDA base layers. Rebuild it with FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04.

Failure mode: Docker fails to start after installing the container toolkit. docker ps returns Cannot connect to the Docker daemon. The nvidia-container-runtime package may have installed conflicting dependencies. Check apt list --installed | grep nvidia and remove conflicting packages.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Install Docker and the NVIDIA container toolkit, verify GPU access inside a container with nvidia-smi, and run a llama.cpp server container with GPU offloading.

← Chapter 8
Headless Server Setup
Chapter 10 →
Docker Compose AI Stack