03. CUDA Toolkit Setup
CUDA toolkit versions must match the compute capability of your GPU and the requirements of the AI software you run. Mismatches cause silent failures where the GPU is detected but no kernels execute, or cryptic CUDA_ERROR_INVALID_IMAGE messages at runtime.
Check GPU compute capability:
nvidia-smi --query-gpu=name,compute_cap --format=csv
# Example output:
# RTX 3090, 8.6
CUDA 12.x supports compute capability 8.0 and above. CUDA 11.x supports 8.0 and above for the 11.8 branch, but CUDA 11.7 maxes out at 8.6. Verify compatibility before installing.
Install CUDA toolkit from NVIDIA's apt repository:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-4
Add the CUDA binary path to your shell:
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
nvcc --version
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2024 NVIDIA Corporation
# Built on Wed_Aug_14_19:57:31_PDT_2024
# Cuda compilation tools, release 12.4, V12.4.131
Failure mode: You install cuda-toolkit-12-4 but the driver is only 525.x, which supports CUDA 12.0 maximum. nvcc runs but any cudaMemcpy call produces CUDA_ERROR_INVALID_VALUE. Check the driver-CUDA compatibility matrix at NVIDIA's docs. The fix is sudo apt install nvidia-driver-535 to upgrade the driver.
Failure mode: You install CUDA from the .run file instead of the apt repo, then run sudo apt update && sudo apt upgrade and the system installs a new kernel. The DKMS-built nouveau-nvidia module now conflicts with the .run-installed driver that was built against the old kernel. Use the apt repository to avoid this.
Failure mode: Multiple CUDA versions installed. nvcc points to 12.4 but the AI framework was compiled against 11.8 and its libcudart.so is in LD_LIBRARY_PATH before 12.4's path. Run ldconfig -p | grep libcudart to see which version the dynamic linker will use.
Run nvidia-smi, note the CUDA version in the header, run nvcc --version, confirm the toolkit version is <= the driver-supported CUDA version, and add both to your shell profile.