NVIDIA B200
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
Datacenter Blackwell. 192GB HBM3e per chip, ~8 TB/s bandwidth. Cloud-tier — you rent these by the hour.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 977 / 1000. Headline = 977 × 0.70 (Estimated-confidence discount) = 684. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 8000 GB/s bandwidth — 960.0 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The B200 is NVIDIA's 2026 frontier datacenter SKU and the architecture-current king of LLM training and inference. 192 GB HBM3e at 8 TB/s, native FP4 support with second-gen Transformer Engine, ~5× the FP8 throughput of H100 SXM, and full NVLink 5 (1.8 TB/s between cards on SXM5/NVL form factor). For frontier-model training this is genuinely transformational: an 8× B200 box (1.5 TB combined memory at 64 TB/s aggregate bandwidth) does in a single node what required 16+ H100 SXM5 cards. For production inference at the cutting edge — FP4-quantized 405B serving, 671B production deployment, trillion-parameter MoE inference — B200 is the only NVIDIA card where these workloads are not just possible but performant. The full Blackwell software stack lands here first: TensorRT-LLM 0.10+, vLLM v0.7+, SGLang's FP4 paths, NeMo's full B200 tuning. NVIDIA's enterprise sales motion (DGX B200, HGX B200, OEM via Supermicro / Dell / HPE) is mature. Cap-ex around $40,000 retail SXM, and ~$5–$8/hr cloud rental on Runpod / Lambda / CoreWeave / Together makes it accessible to teams who need frontier-class compute without owning the hardware.
Where it breaks
- Cap-ex is genuinely substantial. $40,000 retail per card SXM5, plus DGX-class motherboard / cooling / networking. An 8× DGX B200 box is $400k–$500k all-in. Most of the world should be renting B200, not buying.
- Power and thermal density are extreme. 1000 W TDP per card under sustained load. 8-card baseboards pull 8+ kW continuous. Rack power and cooling infrastructure is a real engineering problem; this is hyperscaler-grade.
- First-year software maturity. Blackwell-specific kernels and TE2 optimization are still landing in 2026 frameworks. Some niche workloads or experimental architectures may not yet have B200-tuned paths. NVIDIA's framework team is shipping fast but not all gaps are closed.
- No single-card workstation form factor. B200 is SXM5 / NVL only — no PCIe-only B200 card. Workstation-tier Blackwell is the RTX PRO 6000 Blackwell (96 GB, very different SKU).
- Marginal vs H200 for many production inference workloads. If your model fits 141 GB and FP4 throughput isn't critical, H200 at $31,000 may be the better $/throughput pick. B200 wins big on FP4-aggressive workloads and frontier-scale training; H200 wins on cost-conscious mid-frontier inference.
- Resale uncertainty. B200 is too new in mid-2026 for established used-market pricing. Cap-ex risk is higher than H100 / H200 (which have mature secondary markets).
Ideal model range
- Sweet spot: Frontier-model training (200B–1T parameters). 8× B200 with NVLink 5 mesh is the dominant 2026 training tier.
- Sweet spot: Production FP4 inference at frontier scale — 405B / 671B / 1T-class MoE production serving. Native FP4 + TE2 is the architecture justification for B200 over H200.
- Sweet spot: Multi-tenant production at 70B–200B FP8 with very high concurrency (200+ users).
- Sweet spot: Long-context production inference at 200K+ contexts where B200's 8 TB/s bandwidth dominates decode.
- Stretch: Anything below 192 GB single-card. Possible but you're paying for memory you don't need.
- Comfortable: Anything an H200 does, with FP4 throughput improvements where applicable.
Bad use cases
- Single-developer or hobbyist workloads. Wrong tier entirely. Rent for hours; don't buy.
- Anything that fits 141 GB. H200 is the better $/throughput pick.
- Anything that fits 80 GB. H100 PCIe or L40S wins decisively.
- Cost-conscious inference deployment. B200 is for the frontier; pick H200/L40S/MI300X for cost-optimized production tiers.
- Cap-ex without sustained 24×7 high-utilization workload. Renting B200 at $5–$8/hr breaks even with $40k cap-ex around 5,000–8,000 hours = 7–12 months of 24×7. Most workloads don't justify this.
- Workstation deployment. Pick RTX PRO 6000 Blackwell for workstation-tier 96 GB; B200 is rack-only.
Verdict
Buy this if you're a hyperscaler, cloud provider, frontier AI lab, or enterprise deploying frontier-model training or production at scale, you have datacenter-grade infrastructure (DGX or HGX class), your workloads are FP4-aggressive or genuinely require >141 GB single-card memory, and you've validated cap-ex over a 12+ month horizon. B200 is the architecture-current flagship — for buyers who genuinely operate at the frontier, this is the right pick.
Skip this if your workloads fit 141 GB (H200 wins $/throughput), 80 GB (H100 PCIe wins), or 48 GB (L40S wins). Skip if you're standing up workstation-tier deployments (RTX PRO 6000 Blackwell is the right Blackwell SKU). Skip if your utilization is <50% (rent on Runpod / Lambda / CoreWeave).
How it compares
- vs H200 (141 GB SXM) → B200 has 36% more memory + 67% more bandwidth + native FP4 + TE2 at +30% price ($40k vs $31k). Pick B200 when FP4 throughput materially helps or when 192 GB single-card matters; pick H200 for production inference where the gap doesn't justify the price. See /compare/nvidia-b200-vs-nvidia-h200.
- vs H100 SXM (80 GB) → B200 has 2.4× memory + 2.4× bandwidth + native FP4 + ~5× FP8 throughput at +33% price. For new builds B200 is almost always the right pick over H100 SXM; H100 SXM only matches existing cluster. See /compare/nvidia-b200-vs-nvidia-h100-sxm.
- vs MI355X (288 GB) → MI355X has 50% more memory at lower cap-ex. B200 has 27% more bandwidth + native FP4 + Transformer Engine 2 + the entire NVIDIA ecosystem advantage. Pick B200 for FP4-aggressive frontier training and ecosystem-required deployments; MI355X for cost-sensitive memory-bound serving where ROCm fits. See /compare/nvidia-b200-vs-amd-mi355x.
- vs MI300X (192 GB) → Same memory tier. B200 has 51% more bandwidth + native FP4 + NVIDIA ecosystem at +100% price ($40k vs $20k). Pick B200 when FP4 / ecosystem maturity / NVLink 5 mesh matters; MI300X for cost-sensitive 192 GB deployments where ROCm fits.
- vs renting B200 on cloud → B200 rents at $5–$8/hr SXM on most providers in 2026. Cap-ex breakeven is ~5,000–8,000 hours = 7–12 months of 24×7. Always rent first to validate frontier workload patterns before $400k+ DGX cap-ex commitment.
Overview
Datacenter Blackwell. 192GB HBM3e per chip, ~8 TB/s bandwidth. Cloud-tier — you rent these by the hour.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 192 GB |
| Power draw (peak) | 1000 W |
| Released | 2024 |
| MSRP | $40000 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA B200 with usable context.
Frequently asked
What models can NVIDIA B200 run?
Does NVIDIA B200 support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.