RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Hardware & infrastructure / cuDNN
Hardware & infrastructure

cuDNN

cuDNN (CUDA Deep Neural Network library) is NVIDIA's GPU-accelerated library for deep learning primitives like convolutions, pooling, normalization, and activation functions. Operators running local AI on NVIDIA GPUs encounter cuDNN as a dependency for frameworks like PyTorch, TensorFlow, and vLLM. It optimizes low-level operations to maximize GPU utilization, directly affecting inference speed and VRAM usage. Without cuDNN, operators would see significantly slower tokens/sec and higher memory consumption. cuDNN is bundled with CUDA Toolkit and is version-specific; mismatched versions can cause runtime errors.

Deeper dive

cuDNN provides highly tuned implementations of standard neural network operations. It uses heuristics and auto-tuning to select the fastest algorithm for a given layer shape, batch size, and GPU architecture. For operators, this means that the same model can run at different speeds depending on cuDNN version and GPU generation. cuDNN also supports Tensor Cores on RTX 20-series and later, enabling mixed-precision (FP16/BF16) inference that doubles throughput. However, cuDNN is proprietary and only works on NVIDIA GPUs. AMD GPUs use ROCm's MIOpen, and Apple Silicon uses Metal Performance Shaders (MPS). When running llama.cpp or Ollama, cuDNN is not directly used because these tools rely on custom CUDA kernels (e.g., llama.cpp's ggml CUDA backend). But vLLM and Hugging Face Transformers with PyTorch depend on cuDNN for attention and feed-forward layers.

Practical example

An operator running vLLM with Llama 3.1 8B on an RTX 4090 (24 GB VRAM) will see ~80 tok/s with FP16. This throughput relies on cuDNN's fused attention kernels and Tensor Core convolutions. If cuDNN is missing or outdated, vLLM falls back to PyTorch's native implementations, dropping to ~30 tok/s. Similarly, training a LoRA adapter with Hugging Face Transformers uses cuDNN for backward pass convolutions; a cuDNN version mismatch (e.g., 8.9 vs 9.0) can cause silent numerical differences or crashes.

Workflow example

When installing PyTorch via pip, cuDNN is included automatically (e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118). Operators can verify cuDNN version with python -c "import torch; print(torch.backends.cudnn.version())". In LM Studio, cuDNN is bundled with the runtime; users see a "CUDA" backend option that leverages cuDNN. If an operator encounters "RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED", it usually indicates a driver mismatch or insufficient VRAM for the chosen batch size.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →