GPU Not Detected — Troubleshooting Local AI (Chapter 3)

The Diagnostic Sequence

GPU detection failures cascade from hardware through application. Work through each step before concluding the GPU is working.

Step 1: Hardware Check

lspci | grep -i nvidia

If this returns nothing, the GPU is not visible to the Linux kernel. This means a physical problem (not seated, not powered, BIOS setting) rather than a software problem. Check PCIe visibility in BIOS/UEFI settings.

Step 2: Driver Check

nvidia-smi

If this fails with "command not found", the NVIDIA driver is not installed. If it fails with "No devices were found", the driver loaded but did not detect the GPU—typically a driver-GPU version mismatch or a kernel module loading failure.

# Check loaded kernel modules
lsmod | grep nvidia
# Check dmesg for GPU-related errors
sudo dmesg | grep -i nvidia
sudo dmesg | grep -i nv

Step 3: CUDA Runtime Check

nvidia-smi
# Should show GPU model, driver version, temperature, memory usage

Then verify from Python:

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device count: {torch.cuda.device_count()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')"

Step 4: Container Check

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If this fails but nvidia-smi works on the host, NVIDIA Container Toolkit failed to install or configure correctly.

Common Causes and Fixes

Driver version too old: New GPUs require recent drivers. RTX 40-series cards need driver 535+.

Secure Boot blocking driver: The NVIDIA driver kernel module signed by Secure Boot prevents the driver from loading. Disable Secure Boot in UEFI or sign the module manually.

Docker without NVIDIA runtime: Add "default-runtime": "nvidia" to /etc/docker/daemon.json or use --gpus all on every docker run command.