NVIDIA H100 NVL for local AI

What it does well

The H100 NVL is NVIDIA's "two H100s in a single SKU" datacenter pick — a dual-PCIe-Gen-5 card pair connected via NVLink Gen 4 bridge with 600 GB/s inter-card bandwidth, presenting as 188 GB combined HBM3 memory and 7.8 TB/s aggregate bandwidth. The form factor is two single-slot H100 PCIe cards plus the NVLink bridge — fits in any 2-slot PCIe Gen 5 datacenter server. At ~$60,000 retail, H100 NVL is the cleanest "188 GB CUDA in a single SKU drop-in" path: load 405B Q4 at full context across the pair, run 70B FP16 multi-tenant serving with deep concurrency, or fine-tune 70B FP16 with proper NVLink tensor-parallelism. Power draw at 2× 350 W = 700 W combined matches a single H100 SXM5, with the meaningful advantage that NVLink Gen 4 between paired cards is genuinely close to SXM5 NVLink performance for the 2-card case. For buyers who need 188 GB CUDA + NVLink mesh but don't have DGX/SXM5-class infrastructure, H100 NVL solves a specific deployment problem better than any alternative.

Where it breaks

Cap-ex is brutal. $60,000 retail per SKU is roughly 2.4× a single H100 PCIe at $25,000. The premium pays for the matched-pair NVLink bridge + 188 GB single-procurement-line-item simplicity, but it's hard to justify when 2× discrete H100 PCIe at $50,000 + a third-party NVLink bridge solves the same problem at $10,000 less.
Architecture is no longer current. B200 is the 2026 flagship at 192 GB / 8 TB/s / native FP4. For new cap-ex at frontier scale, B200 is the right tier.
No FP4 native. Hopper has FP8 + first-gen Transformer Engine; Blackwell adds FP4 + TE2. For workloads exploiting FP4 throughput, B200 wins meaningfully.
Limited multi-card scale beyond the 2-card pair. H100 NVL is fundamentally a 2-card SKU. For 4×–8× clusters, H100 SXM5 with full NVLink mesh is the right tier.
Cooling and power infrastructure must support 700 W in a 2-slot footprint. Standard 2U rackmounts often can't handle this thermal density without active liquid or aggressive airflow.
Resale market is thin. H100 NVL has lower transaction volume than discrete H100 PCIe — exit pricing is harder to predict.

Ideal model range

Sweet spot: 405B Q4 / Q5 production inference single-SKU. The 188 GB ceiling fits 405B with comfortable context.
Sweet spot: 70B FP16 multi-tenant production serving with high concurrency (32+ users) — NVLink-paired tensor parallelism is genuinely fast.
Sweet spot: 70B FP16 fine-tuning across the paired cards — the right tier for "fine-tune 70B in a single rack slot."
Sweet spot: 200B-class production inference at FP8 with comfortable headroom.
Stretch: 671B inference at Q3 with paged offload — fits but slower than 8× SXM5.
Comfortable: Anything 2× discrete H100 PCIe with NVLink bridge would do, at simpler procurement.

Bad use cases

Single-card workloads. Pick H100 PCIe — half the cap-ex.
8-card cluster deployments. Pick H100 SXM5 with full NVLink mesh.
New cap-ex at the frontier. Pick B200 — architecture-current with FP4 native.
Cost-conscious 188 GB seekers. 2× discrete H100 PCIe ($50,000) + third-party NVLink bridge ($1,000-2,000) saves $7,000-9,000 vs H100 NVL.
Workstation deployment. Wrong tier — rack-only.
Hobbyist anything. Wrong tier entirely.

Verdict

Buy this if you need 188 GB CUDA in a single procurement line item, you have specific cap-ex governance that prefers single-SKU simplicity over multi-card assembly, your workload is genuinely the 2-card NVLink-paired sweet spot (405B serving / 70B fine-tuning), and the +$7-10k premium over discrete H100 PCIe + bridge pays for procurement simplicity. H100 NVL is the right pick for the narrow buyer who values single-SKU 188 GB CUDA NVLinked deployment.

Skip this if you can build 2× discrete H100 PCIe ($50k) + NVLink bridge ($1.5k) for ~$10k savings, you're standing up new cap-ex (B200 or H200 is almost always the better buy), you need >2-card scale (pick H100 SXM5 cluster), or your workloads fit 80 GB (H100 PCIe wins).

How it compares

vs H100 PCIe (80 GB) → H100 NVL is fundamentally 2× discrete H100 PCIe pre-paired with NVLink. Pick discrete H100 PCIe for single-card or DIY-bridge deployments at lower cost; H100 NVL only when single-SKU procurement matters. See /compare/nvidia-h100-nvl-vs-nvidia-h100-pcie.
vs H100 SXM5 (80 GB) → SXM5 has full NVLink mesh (900 GB/s between cards in 8-card baseboard) at higher cap-ex per card. NVL is the 2-card-paired PCIe form. Pick SXM5 for 4×–8× clusters; NVL for 2-card paired NVLink in standard PCIe servers.
vs H200 (141 GB SXM) → H200 is the architecturally-current Hopper refresh at $31,000 SXM. 2× H200 SXM gives 282 GB combined. For new builds, H200 dominates. NVL only when 188 GB single-SKU PCIe is the specific requirement.
vs B200 (192 GB SXM) → B200 has same effective memory (192 GB) + native FP4 + TE2 + 67% more bandwidth + NVLink Gen 5 at +33% price ($40k vs $60k for NVL pair). Pick B200 for new builds; NVL only when matching existing H100 cluster.
vs MI300X (192 GB) → MI300X gives same memory tier (192 GB on one card) at $20k cap-ex with ROCm ecosystem trade-offs. 1/3 the price of H100 NVL. Pick MI300X when ROCm fits and cost matters; NVL when CUDA + NVLink mesh are non-negotiable.

Frequently asked

What models can NVIDIA H100 NVL run?

With 188GB VRAM, the NVIDIA H100 NVL runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H100 NVL support CUDA?

Yes — NVIDIA H100 NVL is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

VRAM	188 GB
Power draw (peak)	800 W
Released	2023
MSRP	$60000
Backends	CUDA

NVIDIA H100 NVL

Our verdict

What it does well

Where it breaks

Ideal model range

Bad use cases

Verdict

How it compares

Overview

Specs

Models that fit

Frequently asked

What models can NVIDIA H100 NVL run?

Does NVIDIA H100 NVL support CUDA?

Where next?

Hardware worth comparing