RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Hardware
  4. /NVIDIA H100 NVL
UNIT · NVIDIA · GPU
188 GB VRAMworkstation·Reviewed June 2026

NVIDIA H100 NVL

NVDA · HARDWARE
NVIDIA H100 NVL

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

Dual-card H100 with 188GB combined memory. Built for LLM serving.

Released 2023·3938 GB/s memory bandwidth
▼ CHECK CURRENT PRICE· 1 retailer
NVIDIA H100 NVL
Check on Amazon→

Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.

RUNLOCALAI SCORE
See full leaderboard →
663/ 1000
BB-tier
Estimated
Throughput
500/ 500
VRAM-fit
200/ 200
Ecosystem
200/ 200
Efficiency
47/ 100

Sub-scores sum to 947 / 1000. Headline = 947 × 0.70 (Estimated-confidence discount) = 663. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →

Extrapolated from 3938 GB/s bandwidth — 472.6 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT
Try other hardware →

Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.

7B chat✓
Comfortable
14B chat✓
Comfortable
32B chat✓
Comfortable
70B chat✓
Comfortable
Coding agent✓
Comfortable
Vision (≤8B VLM)✓
Comfortable
Long context (32K)✓
Comfortable
✓Comfortable — fits with headroom
~Tight — works, no slack
△Marginal — needs aggressive quant
✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
10.0/10

What it does well

The H100 NVL is NVIDIA's "two H100s in a single SKU" datacenter pick — a dual-PCIe-Gen-5 card pair connected via NVLink Gen 4 bridge with 600 GB/s inter-card bandwidth, presenting as 188 GB combined HBM3 memory and 7.8 TB/s aggregate bandwidth. The form factor is two single-slot H100 PCIe cards plus the NVLink bridge — fits in any 2-slot PCIe Gen 5 datacenter server. At ~$60,000 retail, H100 NVL is the cleanest "188 GB CUDA in a single SKU drop-in" path: load 405B Q4 at full context across the pair, run 70B FP16 multi-tenant serving with deep concurrency, or fine-tune 70B FP16 with proper NVLink tensor-parallelism. Power draw at 2× 350 W = 700 W combined matches a single H100 SXM5, with the meaningful advantage that NVLink Gen 4 between paired cards is genuinely close to SXM5 NVLink performance for the 2-card case. For buyers who need 188 GB CUDA + NVLink mesh but don't have DGX/SXM5-class infrastructure, H100 NVL solves a specific deployment problem better than any alternative.

Where it breaks

  • Cap-ex is brutal. $60,000 retail per SKU is roughly 2.4× a single H100 PCIe at $25,000. The premium pays for the matched-pair NVLink bridge + 188 GB single-procurement-line-item simplicity, but it's hard to justify when 2× discrete H100 PCIe at $50,000 + a third-party NVLink bridge solves the same problem at $10,000 less.
  • Architecture is no longer current. B200 is the 2026 flagship at 192 GB / 8 TB/s / native FP4. For new cap-ex at frontier scale, B200 is the right tier.
  • No FP4 native. Hopper has FP8 + first-gen Transformer Engine; Blackwell adds FP4 + TE2. For workloads exploiting FP4 throughput, B200 wins meaningfully.
  • Limited multi-card scale beyond the 2-card pair. H100 NVL is fundamentally a 2-card SKU. For 4×–8× clusters, H100 SXM5 with full NVLink mesh is the right tier.
  • Cooling and power infrastructure must support 700 W in a 2-slot footprint. Standard 2U rackmounts often can't handle this thermal density without active liquid or aggressive airflow.
  • Resale market is thin. H100 NVL has lower transaction volume than discrete H100 PCIe — exit pricing is harder to predict.

Ideal model range

  • Sweet spot: 405B Q4 / Q5 production inference single-SKU. The 188 GB ceiling fits 405B with comfortable context.
  • Sweet spot: 70B FP16 multi-tenant production serving with high concurrency (32+ users) — NVLink-paired tensor parallelism is genuinely fast.
  • Sweet spot: 70B FP16 fine-tuning across the paired cards — the right tier for "fine-tune 70B in a single rack slot."
  • Sweet spot: 200B-class production inference at FP8 with comfortable headroom.
  • Stretch: 671B inference at Q3 with paged offload — fits but slower than 8× SXM5.
  • Comfortable: Anything 2× discrete H100 PCIe with NVLink bridge would do, at simpler procurement.

Bad use cases

  • Single-card workloads. Pick H100 PCIe — half the cap-ex.
  • 8-card cluster deployments. Pick H100 SXM5 with full NVLink mesh.
  • New cap-ex at the frontier. Pick B200 — architecture-current with FP4 native.
  • Cost-conscious 188 GB seekers. 2× discrete H100 PCIe ($50,000) + third-party NVLink bridge ($1,000-2,000) saves $7,000-9,000 vs H100 NVL.
  • Workstation deployment. Wrong tier — rack-only.
  • Hobbyist anything. Wrong tier entirely.

Verdict

Buy this if you need 188 GB CUDA in a single procurement line item, you have specific cap-ex governance that prefers single-SKU simplicity over multi-card assembly, your workload is genuinely the 2-card NVLink-paired sweet spot (405B serving / 70B fine-tuning), and the +$7-10k premium over discrete H100 PCIe + bridge pays for procurement simplicity. H100 NVL is the right pick for the narrow buyer who values single-SKU 188 GB CUDA NVLinked deployment.

Skip this if you can build 2× discrete H100 PCIe ($50k) + NVLink bridge ($1.5k) for ~$10k savings, you're standing up new cap-ex (B200 or H200 is almost always the better buy), you need >2-card scale (pick H100 SXM5 cluster), or your workloads fit 80 GB (H100 PCIe wins).

How it compares

  • vs H100 PCIe (80 GB) → H100 NVL is fundamentally 2× discrete H100 PCIe pre-paired with NVLink. Pick discrete H100 PCIe for single-card or DIY-bridge deployments at lower cost; H100 NVL only when single-SKU procurement matters. See /compare/nvidia-h100-nvl-vs-nvidia-h100-pcie.
  • vs H100 SXM5 (80 GB) → SXM5 has full NVLink mesh (900 GB/s between cards in 8-card baseboard) at higher cap-ex per card. NVL is the 2-card-paired PCIe form. Pick SXM5 for 4×–8× clusters; NVL for 2-card paired NVLink in standard PCIe servers.
  • vs H200 (141 GB SXM) → H200 is the architecturally-current Hopper refresh at $31,000 SXM. 2× H200 SXM gives 282 GB combined. For new builds, H200 dominates. NVL only when 188 GB single-SKU PCIe is the specific requirement.
  • vs B200 (192 GB SXM) → B200 has same effective memory (192 GB) + native FP4 + TE2 + 67% more bandwidth + NVLink Gen 5 at +33% price ($40k vs $60k for NVL pair). Pick B200 for new builds; NVL only when matching existing H100 cluster.
  • vs MI300X (192 GB) → MI300X gives same memory tier (192 GB on one card) at $20k cap-ex with ROCm ecosystem trade-offs. 1/3 the price of H100 NVL. Pick MI300X when ROCm fits and cost matters; NVL when CUDA + NVLink mesh are non-negotiable.
BLK · OVERVIEW

Overview

Dual-card H100 with 188GB combined memory. Built for LLM serving.

Retailers we'd check:Amazon

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM188 GB
Power draw (peak)800 W
Released2023
MSRP$60000
Backends
CUDA

Models that fit

Open-weight models small enough to run on NVIDIA H100 NVL with usable context.

all-MiniLM-L6-v2
0.022B · other
FLUX.1 [dev]
12B · other
Qwen 3 0.6B
0.6B · qwen
BGE Large EN v1.5
0.335B · other
Llama 4 Scout
109B · llama
Nomic Embed Text v1.5
0.137B · other
Kokoro 82M
0.082B · other
Llama 3.1 8B Instruct
8B · llama

Frequently asked

What models can NVIDIA H100 NVL run?

With 188GB VRAM, the NVIDIA H100 NVL runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H100 NVL support CUDA?

Yes — NVIDIA H100 NVL is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

Where next?

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.

Compare alternatives

Hardware worth comparing

The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.

Closest matches
Similar price, bandwidth & form factor
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
  • AMD Instinct MI250X
    amd · 128 GB VRAM
    9.7/10
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
Step up
More capable — more memory or a higher tier
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • AMD Instinct MI355X
    amd · 288 GB VRAM
    10.0/10
  • AMD Instinct MI350X
    amd · 288 GB VRAM
    8.3/10
Step down
Lighter — cheaper or more constrained
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10