NCCL error: peer to peer not supported (multi-GPU)
Cause
Environment: Multi-GPU NVIDIA hosts running tensor-parallel inference (vLLM, TGI, SGLang) on consumer chipsets or hosts with strict IOMMU/ACS.
Severity: high — tensor-parallel jobs won't start.
- Consumer chipsets (X670, Z790, B650) often lack PCIe peer-to-peer between GPUs by default
- IOMMU + ACS enabled in BIOS isolates each PCIe slot, blocking GPU-to-GPU DMA
- 4× consumer GPUs in a board that bifurcates the lane to ×8/×8 break P2P with one combination
- NCCL detects no P2P path and refuses (older versions) instead of falling back to PCIe staging
- Mixed-vendor GPU slots (one in CPU lanes, one in chipset lanes)
Solution
1. Disable P2P and force NCCL through host memory (works everywhere, ~10-30% slower than true P2P):
export NCCL_P2P_DISABLE=1
export NCCL_IB_DISABLE=1 # also disable InfiniBand if no fabric
Add to your launch script or systemd unit. Most consumer multi-GPU setups need this.
2. Verify P2P matrix to see which pairs are blocked:
# Build CUDA samples or run nvidia's built-in:
cd /usr/local/cuda/extras/demo_suite
./p2pBandwidthLatencyTest
Look for "P2P=Yes" entries; missing pairs are your blocked links.
3. BIOS settings (if you have datacenter-grade boards):
- Disable IOMMU / VT-d
- Disable ACS Override
- Enable "Above 4G Decoding" + "Re-Size BAR"
- Set PCIe slots to x16/x16 mode (not x8/x8)
4. Use a server chipset if you genuinely need P2P at scale — Threadripper PRO + WRX80/WRX90 or EPYC + SP5 expose full PCIe lanes with native P2P. Consumer boards are a hard ceiling for tensor-parallel scaling.
5. For 2 GPUs only, skip tensor parallelism entirely and use pipeline parallelism — splits layers across GPUs with no all-reduce, no P2P needed:
vllm serve qwen2.5-72b --pipeline-parallel-size 2 --tensor-parallel-size 1
Alternative solutions
Platform-specific note: NCCL_P2P_DISABLE=1 only matters on multi-GPU hosts. On a single-GPU rig the variable does nothing — don't set it cargo-cult style; it's noise. On AMD ROCm, the equivalent is HSA_FORCE_FINE_GRAIN_PCIE=1 and disabling RCCL P2P with RCCL_P2P_DISABLE=1.
Related errors
Did this fix it?
If your case was different, email Contact support with what you saw and we'll update the page. If it worked but took different commands on your platform, we want to know that too.