RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Errors / Driver issues / NCCL error: peer to peer not supported (multi-GPU)
Driver issues
Verified by owner

NCCL error: peer to peer not supported (multi-GPU)

NCCL error: unhandled system error / peer to peer access not supported between GPU{0} and GPU{1}
By Fredoline Eruo · Last verified Jun 12, 2026

Cause

Environment: Multi-GPU NVIDIA hosts running tensor-parallel inference (vLLM, TGI, SGLang) on consumer chipsets or hosts with strict IOMMU/ACS.

Severity: high — tensor-parallel jobs won't start.

  • Consumer chipsets (X670, Z790, B650) often lack PCIe peer-to-peer between GPUs by default
  • IOMMU + ACS enabled in BIOS isolates each PCIe slot, blocking GPU-to-GPU DMA
  • 4× consumer GPUs in a board that bifurcates the lane to ×8/×8 break P2P with one combination
  • NCCL detects no P2P path and refuses (older versions) instead of falling back to PCIe staging
  • Mixed-vendor GPU slots (one in CPU lanes, one in chipset lanes)

Solution

1. Disable P2P and force NCCL through host memory (works everywhere, ~10-30% slower than true P2P):

export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1   # also disable InfiniBand if no fabric

Add to your launch script or systemd unit. Most consumer multi-GPU setups need this.

2. Verify P2P matrix to see which pairs are blocked:

# Build CUDA samples or run nvidia's built-in:
cd /usr/local/cuda/extras/demo_suite
./p2pBandwidthLatencyTest

Look for "P2P=Yes" entries; missing pairs are your blocked links.

3. BIOS settings (if you have datacenter-grade boards):

  • Disable IOMMU / VT-d
  • Disable ACS Override
  • Enable "Above 4G Decoding" + "Re-Size BAR"
  • Set PCIe slots to x16/x16 mode (not x8/x8)

4. Use a server chipset if you genuinely need P2P at scale — Threadripper PRO + WRX80/WRX90 or EPYC + SP5 expose full PCIe lanes with native P2P. Consumer boards are a hard ceiling for tensor-parallel scaling.

5. For 2 GPUs only, skip tensor parallelism entirely and use pipeline parallelism — splits layers across GPUs with no all-reduce, no P2P needed:

vllm serve qwen2.5-72b --pipeline-parallel-size 2 --tensor-parallel-size 1

Alternative solutions

Platform-specific note: NCCL_P2P_DISABLE=1 only matters on multi-GPU hosts. On a single-GPU rig the variable does nothing — don't set it cargo-cult style; it's noise. On AMD ROCm, the equivalent is HSA_FORCE_FINE_GRAIN_PCIE=1 and disabling RCCL P2P with RCCL_P2P_DISABLE=1.

Related errors

  • CUDA driver version is insufficient for CUDA runtime version
  • PyTorch CUDA error: driver version is insufficient for CUDA runtime
  • WSL2: nvidia-smi works but PyTorch sees no CUDA / libcuda.so missing
  • WSL2 GPU not detected — nvidia-smi missing or empty
  • nvidia-smi: command not found

Did this fix it?

If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.