NVIDIA H100 NVL
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
Dual-card H100 with 188GB combined memory. Built for LLM serving.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 947 / 1000. Headline = 947 × 0.70 (Estimated-confidence discount) = 663. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 3938 GB/s bandwidth — 472.6 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The H100 NVL is NVIDIA's "two H100s in a single SKU" datacenter pick — a dual-PCIe-Gen-5 card pair connected via NVLink Gen 4 bridge with 600 GB/s inter-card bandwidth, presenting as 188 GB combined HBM3 memory and 7.8 TB/s aggregate bandwidth. The form factor is two single-slot H100 PCIe cards plus the NVLink bridge — fits in any 2-slot PCIe Gen 5 datacenter server. At ~$60,000 retail, H100 NVL is the cleanest "188 GB CUDA in a single SKU drop-in" path: load 405B Q4 at full context across the pair, run 70B FP16 multi-tenant serving with deep concurrency, or fine-tune 70B FP16 with proper NVLink tensor-parallelism. Power draw at 2× 350 W = 700 W combined matches a single H100 SXM5, with the meaningful advantage that NVLink Gen 4 between paired cards is genuinely close to SXM5 NVLink performance for the 2-card case. For buyers who need 188 GB CUDA + NVLink mesh but don't have DGX/SXM5-class infrastructure, H100 NVL solves a specific deployment problem better than any alternative.
Where it breaks
- Cap-ex is brutal. $60,000 retail per SKU is roughly 2.4× a single H100 PCIe at $25,000. The premium pays for the matched-pair NVLink bridge + 188 GB single-procurement-line-item simplicity, but it's hard to justify when 2× discrete H100 PCIe at $50,000 + a third-party NVLink bridge solves the same problem at $10,000 less.
- Architecture is no longer current. B200 is the 2026 flagship at 192 GB / 8 TB/s / native FP4. For new cap-ex at frontier scale, B200 is the right tier.
- No FP4 native. Hopper has FP8 + first-gen Transformer Engine; Blackwell adds FP4 + TE2. For workloads exploiting FP4 throughput, B200 wins meaningfully.
- Limited multi-card scale beyond the 2-card pair. H100 NVL is fundamentally a 2-card SKU. For 4×–8× clusters, H100 SXM5 with full NVLink mesh is the right tier.
- Cooling and power infrastructure must support 700 W in a 2-slot footprint. Standard 2U rackmounts often can't handle this thermal density without active liquid or aggressive airflow.
- Resale market is thin. H100 NVL has lower transaction volume than discrete H100 PCIe — exit pricing is harder to predict.
Ideal model range
- Sweet spot: 405B Q4 / Q5 production inference single-SKU. The 188 GB ceiling fits 405B with comfortable context.
- Sweet spot: 70B FP16 multi-tenant production serving with high concurrency (32+ users) — NVLink-paired tensor parallelism is genuinely fast.
- Sweet spot: 70B FP16 fine-tuning across the paired cards — the right tier for "fine-tune 70B in a single rack slot."
- Sweet spot: 200B-class production inference at FP8 with comfortable headroom.
- Stretch: 671B inference at Q3 with paged offload — fits but slower than 8× SXM5.
- Comfortable: Anything 2× discrete H100 PCIe with NVLink bridge would do, at simpler procurement.
Bad use cases
- Single-card workloads. Pick H100 PCIe — half the cap-ex.
- 8-card cluster deployments. Pick H100 SXM5 with full NVLink mesh.
- New cap-ex at the frontier. Pick B200 — architecture-current with FP4 native.
- Cost-conscious 188 GB seekers. 2× discrete H100 PCIe ($50,000) + third-party NVLink bridge ($1,000-2,000) saves $7,000-9,000 vs H100 NVL.
- Workstation deployment. Wrong tier — rack-only.
- Hobbyist anything. Wrong tier entirely.
Verdict
Buy this if you need 188 GB CUDA in a single procurement line item, you have specific cap-ex governance that prefers single-SKU simplicity over multi-card assembly, your workload is genuinely the 2-card NVLink-paired sweet spot (405B serving / 70B fine-tuning), and the +$7-10k premium over discrete H100 PCIe + bridge pays for procurement simplicity. H100 NVL is the right pick for the narrow buyer who values single-SKU 188 GB CUDA NVLinked deployment.
Skip this if you can build 2× discrete H100 PCIe ($50k) + NVLink bridge ($1.5k) for ~$10k savings, you're standing up new cap-ex (B200 or H200 is almost always the better buy), you need >2-card scale (pick H100 SXM5 cluster), or your workloads fit 80 GB (H100 PCIe wins).
How it compares
- vs H100 PCIe (80 GB) → H100 NVL is fundamentally 2× discrete H100 PCIe pre-paired with NVLink. Pick discrete H100 PCIe for single-card or DIY-bridge deployments at lower cost; H100 NVL only when single-SKU procurement matters. See /compare/nvidia-h100-nvl-vs-nvidia-h100-pcie.
- vs H100 SXM5 (80 GB) → SXM5 has full NVLink mesh (900 GB/s between cards in 8-card baseboard) at higher cap-ex per card. NVL is the 2-card-paired PCIe form. Pick SXM5 for 4×–8× clusters; NVL for 2-card paired NVLink in standard PCIe servers.
- vs H200 (141 GB SXM) → H200 is the architecturally-current Hopper refresh at $31,000 SXM. 2× H200 SXM gives 282 GB combined. For new builds, H200 dominates. NVL only when 188 GB single-SKU PCIe is the specific requirement.
- vs B200 (192 GB SXM) → B200 has same effective memory (192 GB) + native FP4 + TE2 + 67% more bandwidth + NVLink Gen 5 at +33% price ($40k vs $60k for NVL pair). Pick B200 for new builds; NVL only when matching existing H100 cluster.
- vs MI300X (192 GB) → MI300X gives same memory tier (192 GB on one card) at $20k cap-ex with ROCm ecosystem trade-offs. 1/3 the price of H100 NVL. Pick MI300X when ROCm fits and cost matters; NVL when CUDA + NVLink mesh are non-negotiable.
Overview
Dual-card H100 with 188GB combined memory. Built for LLM serving.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 188 GB |
| Power draw (peak) | 800 W |
| Released | 2023 |
| MSRP | $60000 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA H100 NVL with usable context.
Frequently asked
What models can NVIDIA H100 NVL run?
Does NVIDIA H100 NVL support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.