NVLink

NVLink is NVIDIA's proprietary GPU-to-GPU interconnect, used to bind multiple data-center GPUs into a coherent memory fabric. NVLink 4 (H100) runs at 900 GB/s bidirectional per link; multiple links per GPU stack to total bandwidth.

For local AI, NVLink matters when running multi-GPU tensor parallelism: a 70B model split across 2× RTX 3090s with NVLink hits significantly higher tok/s than the same setup over PCIe 4.0 (32 GB/s) because of the all-reduces between layers.

Consumer NVLink ended with the RTX 30 series. RTX 40 and 50 series have no NVLink — multi-GPU on consumer cards now relies on PCIe alone, which is the major bottleneck for tensor-parallel local inference.

Related terms

See also