Hardware & infrastructure

cuDNN

cuDNN (CUDA Deep Neural Network library) is NVIDIA's GPU-accelerated library for deep learning primitives like convolutions, pooling, normalization, and activation functions. Operators running local AI on NVIDIA GPUs encounter cuDNN as a dependency for frameworks like PyTorch, TensorFlow, and vLLM. It optimizes low-level operations to maximize GPU utilization, directly affecting inference speed and VRAM usage. Without cuDNN, operators would see significantly slower tokens/sec and higher memory consumption. cuDNN is bundled with CUDA Toolkit and is version-specific; mismatched versions can cause runtime errors.

Deeper dive

cuDNN provides highly tuned implementations of standard neural network operations. It uses heuristics and auto-tuning to select the fastest algorithm for a given layer shape, batch size, and GPU architecture. For operators, this means that the same model can run at different speeds depending on cuDNN version and GPU generation. cuDNN also supports Tensor Cores on RTX 20-series and later, enabling mixed-precision (FP16/BF16) inference that doubles throughput. However, cuDNN is proprietary and only works on NVIDIA GPUs. AMD GPUs use ROCm's MIOpen, and Apple Silicon uses Metal Performance Shaders (MPS). When running llama.cpp or Ollama, cuDNN is not directly used because these tools rely on custom CUDA kernels (e.g., llama.cpp's ggml CUDA backend). But vLLM and Hugging Face Transformers with PyTorch depend on cuDNN for attention and feed-forward layers.

Practical example

An operator running vLLM with Llama 3.1 8B on an RTX 4090 (24 GB VRAM) will see ~80 tok/s with FP16. This throughput relies on cuDNN's fused attention kernels and Tensor Core convolutions. If cuDNN is missing or outdated, vLLM falls back to PyTorch's native implementations, dropping to ~30 tok/s. Similarly, training a LoRA adapter with Hugging Face Transformers uses cuDNN for backward pass convolutions; a cuDNN version mismatch (e.g., 8.9 vs 9.0) can cause silent numerical differences or crashes.

Workflow example

When installing PyTorch via pip, cuDNN is included automatically (e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118). Operators can verify cuDNN version with python -c "import torch; print(torch.backends.cudnn.version())". In LM Studio, cuDNN is bundled with the runtime; users see a "CUDA" backend option that leverages cuDNN. If an operator encounters "RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED", it usually indicates a driver mismatch or insufficient VRAM for the chosen batch size.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work