NVIDIA GB200 NVL72
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
72-GPU Blackwell rack with 36 Grace CPUs. Hyperscale-only — relevant context here for understanding 'what frontier training runs on'.
Sub-scores sum to 901 / 1000. Headline = 901 × 0.70 (Estimated-confidence discount) = 631. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 8000 GB/s bandwidth — 960.0 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The GB200 NVL72 is NVIDIA's rack-scale Blackwell-generation training and inference platform — 72× B200 SXM5 GPUs + 36× Grace ARM CPUs in a single liquid-cooled rack with full NVLink 5 mesh interconnect (130 TB/s aggregate fabric bandwidth). Total memory: 13.5 TB HBM3e at 576 TB/s aggregate bandwidth across the rack. This is the GPU compute platform NVIDIA built for trillion-parameter foundation model training and is what frontier AI labs (Anthropic, OpenAI, Google, Meta, xAI) deploy at scale in 2026. A single GB200 NVL72 rack handles full GPT-4-class training workloads or production inference for trillion-parameter MoE models with comfortable concurrency. The Grace-Hopper-style unified memory between Grace ARM CPU and Blackwell GPU dramatically reduces CPU↔GPU transfer overhead vs traditional PCIe-attached architectures. Liquid cooling at the rack level handles the ~120 kW power envelope. Cap-ex per rack lands at ~$3-3.5M list, with hyperscaler-tier discounts bringing volume orders below.
Where it breaks
- Cap-ex is hyperscaler-tier. $3M+ per rack. Out of scope for anyone but cloud providers, frontier AI labs, and very large enterprises with sovereign-AI mandates.
- Power and cooling infrastructure is non-trivial. ~120 kW per rack requires liquid cooling, datacenter-grade power distribution, and 30A+ 400V three-phase circuits. Not for any normal datacenter — this is the modern equivalent of mainframe procurement.
- Lead times measured in months. GB200 NVL72 production runs are sold out 6-12 months ahead. Adding capacity is not a quick decision.
- Site infrastructure cost dwarfs GPU cap-ex. Racks like this need dedicated cooling distribution, redundant power feeds, full DCIM integration. Enterprise deployments often spend more on facility upgrades than on the rack itself.
- Operational complexity. Operating GB200 NVL72 at peak utilization requires SRE/HPC engineering capacity that most enterprises don't have. Cloud rental (Lambda, CoreWeave, AWS Trainium-style) is almost always the right path.
- Architecture-current with rapid succession. GB300 / next-gen Blackwell-Ultra rumored for 2026-2027 — cap-ex risk on a 4-5 year deployment horizon is real.
Ideal model range
- Sweet spot: Trillion-parameter foundation model training (1T+ MoE, dense models 405B+). Single rack handles GPT-4-class training.
- Sweet spot: Production inference at hyperscale — millions of inference requests/sec across mixed model sizes via TensorRT-LLM + Triton.
- Sweet spot: Frontier-model fine-tuning (RLHF, instruction tuning) on 405B-class models with comfortable headroom.
- Sweet spot: Multi-tenant cloud GPU rental — dominant cap-ex tier for Lambda, CoreWeave, AWS, Azure, GCP frontier offerings in 2026.
- Sweet spot: Sovereign AI initiatives (national labs, defense, large pharma) where data residency requirements mandate on-prem deployment.
Bad use cases
- Anyone but hyperscalers + frontier labs + very large enterprises. Wrong tier entirely.
- Single-team production inference. Pick B200 discrete or H200 SXM cluster.
- Inference workloads that fit a single B200. Wrong scale.
- Cap-ex without sustained 24×7 high-utilization workload. Rental on cloud providers is almost always the right path.
- Anyone who reads this verdict on a public site and is not at a hyperscaler. This isn't a buying decision — it's reference info on the platform that powers the AI cloud rental tier you're consuming.
Verdict
Buy this if you operate hyperscaler / frontier AI lab / sovereign AI infrastructure at scale and the rack-level NVLink Gen 5 mesh + 13.5 TB HBM3e + Grace integration genuinely unlock workloads no smaller cluster can match. GB200 NVL72 is the architecturally-defining platform for 2026 AI compute at the trillion-parameter scale.
Skip this if you're not actively spec'ing $3M+ cap-ex commitments. Pick B200 SXM cluster or H200 SXM cluster at smaller scale. For most readers, this verdict is informational — you'll consume GB200 NVL72 throughput via cloud providers, not own one. Standard cloud frontier-tier (CoreWeave, Lambda) is the right path.
How it compares
- vs B200 SXM → GB200 NVL72 is fundamentally a 72× B200 SXM rack with full NVLink Gen 5 mesh + Grace ARM integration. Pick discrete B200 SXM for sub-rack deployments; NVL72 for rack-scale frontier work. The NVL72 form factor is what justifies the integration premium.
- vs DGX H200 (8× H200 SXM5) → DGX H200 is the prior-gen 8-card SXM5 platform at ~$300k. GB200 NVL72 is the 72-card rack-scale Blackwell platform at ~$3M. Different scale tiers entirely.
- vs custom 8× B200 HGX → Custom Blackwell HGX server (8× B200 SXM5) at ~$320k cap-ex without the 72-card mesh + Grace integration. Pick HGX for sub-rack scale; NVL72 for frontier rack-scale.
- vs MI355X cluster → AMD's frontier rack-scale platform is the equivalent compute on AMD ecosystem. Pick MI355X for ROCm-aligned hyperscaler builds; NVL72 for CUDA + frontier ecosystem maturity.
- vs renting on CoreWeave / Lambda → GB200 NVL72 cap-ex breakeven (~$3M) requires roughly 2-3 years of 24×7 utilization at hyperscaler-tier rental rates. Cloud providers handle this math; most enterprises do not.
Overview
72-GPU Blackwell rack with 36 Grace CPUs. Hyperscale-only — relevant context here for understanding 'what frontier training runs on'.
Search-fallback link — editorial hasn't yet curated a retailer URL for this card.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 13824 GB |
| Power draw (peak) | 120000 W |
| Released | 2024 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA GB200 NVL72 with usable context.
Frequently asked
What models can NVIDIA GB200 NVL72 run?
Does NVIDIA GB200 NVL72 support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.