RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
  1. >
  2. Home
  3. /Hardware
  4. /NVIDIA H200
UNIT · NVIDIA · GPU
141 GB VRAMworkstation·Reviewed June 2026

NVIDIA H200

NVDA · HARDWARE
NVIDIA H200

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

Hopper refresh — 141GB HBM3e at ~4.8 TB/s. Datacenter-class; rentable on RunPod, Lambda, etc.

Released 2024·4800 GB/s memory bandwidth
▼ CHECK CURRENT PRICE· 1 retailer
NVIDIA H200
Check on Amazon→

Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.

RUNLOCALAI SCORE
See full leaderboard →
676/ 1000
BB-tier
Estimated
Throughput
500/ 500
VRAM-fit
200/ 200
Ecosystem
200/ 200
Efficiency
66/ 100

Sub-scores sum to 966 / 1000. Headline = 966 × 0.70 (Estimated-confidence discount) = 676. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →

Extrapolated from 4800 GB/s bandwidth — 576.0 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT
Try other hardware →

Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.

7B chat✓
Comfortable
14B chat✓
Comfortable
32B chat✓
Comfortable
70B chat✓
Comfortable
Coding agent✓
Comfortable
Vision (≤8B VLM)✓
Comfortable
Long context (32K)✓
Comfortable
✓Comfortable — fits with headroom
~Tight — works, no slack
△Marginal — needs aggressive quant
✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
10.0/10

What it does well

The H200 is the H100's mid-life refresh and the right answer for almost every "I need datacenter-grade frontier model inference and I don't already own H100s" decision in 2026. The headline change from H100: 141 GB HBM3e at 4.8 TB/s, vs the H100's 80 GB HBM3 at 3.35 TB/s. That's ~76% more memory and ~43% more bandwidth, on the same architecture, in the same SXM5 socket, with the same software stack. The bandwidth gain shows up directly in long-context decode and large-prompt prefill — the workloads where H100 was already best-in-class. Memory headroom now fits Llama 405B Q4 on a single card or DeepSeek V3 671B at Q1.5 with comfortable context, and 2× H200 NVLinked (282 GB combined at NVLink 900 GB/s) handles 405B FP16 or 671B Q3 with full operational context. NVIDIA's full enterprise stack works: NeMo, Triton, TensorRT-LLM, BlueField DPU integration, MIG partitioning. Cloud rental at ~$3–4.50/hr on Runpod / Lambda makes it accessible without cap-ex.

Where it breaks

  • It's no longer the frontier. B200 at 192 GB / 8 TB/s is the 2026 training flagship. H200 is the 2024 flagship; B200 is what NVIDIA wants you to buy now. For training scale and FP4 throughput, B200 wins. For inference, H200's $/throughput is still better than B200 in most realistic 2026 workloads.
  • SXM5 only at top tier — PCIe H200 NVL is a different SKU with much lower bandwidth. The 4.8 TB/s spec is SXM5. PCIe H200 NVL is ~3 TB/s effective. Read the SKU carefully when renting or buying.
  • Cap-ex is real. $30,000–$32,000 retail for SXM5 H200, plus the DGX-class motherboard and cooling overhead. Most of the world should be renting H200, not buying.
  • Power and thermal density. 700 W TDP, dense rack workloads, datacenter cooling assumed. Not for an under-the-desk workstation. The "H200 in your office" build doesn't exist outside DGX Station tier.
  • Marginal vs H100 for many workloads. If your model fits 80 GB and your context isn't the bottleneck, H100 at $25,000 vs H200 at $31,000 may not justify the upgrade — the gap is meaningful but not transformational on shorter-context inference.

Ideal model range

  • Sweet spot: 405B-class single-card inference at Q4–Q5 with long context. The first datacenter card that does this without multi-card complexity.
  • Sweet spot: 70B and 200B-class at FP16 with very long contexts (128K+) where bandwidth dominates. The 4.8 TB/s shows here.
  • Sweet spot: Multi-tenant production serving — vLLM continuous batching across 30–80 concurrent users on 70B FP16 with 16–32K context, or 100+ users on 32B FP16.
  • Stretch: 671B (DeepSeek V3 / R1) at Q1.5–Q2 single-card. Yes it runs, no it's not the best $/req — pick 2× H200 with NVLink for proper 671B serving.
  • Stretch: Frontier-model fine-tuning. 70B QLoRA fits one H200 with comfortable headroom; 70B FP16 fine-tuning fits 2× H200 NVLinked.

Bad use cases

  • Single-developer hobby workloads. Rent on Runpod or buy a 4090 / 5090. The H200's value is multi-tenant production serving and frontier inference, not single-user.
  • Anything that fits 24–48 GB. L40S at 1/4 the price wins by every metric. Don't overprovision.
  • Frontier training where you'd actually use B200's FP4. Rent B200 instead.
  • Buying retail at cap-ex without a steady-state workload. Renting H200 at $3–4.50/hr breaks even with cap-ex around 7,000+ hours of utilization (~9 months of 24×7). Most workloads don't justify this.

Verdict

Buy this if you're operating a datacenter or colo with steady-state 70B+ FP16 production inference, frontier-model serving (200B/405B/671B), or multi-tenant inference at scale, and you've calculated cap-ex vs rental over a 2-year horizon and the cap-ex wins. The H200 is the canonical "I need datacenter-grade frontier inference and I'm running it 24×7" GPU. Pair with NVLink for 282 GB tier when single-card isn't enough.

Skip this if you're a hobbyist (rent or buy consumer), your workload fits L40S (much better $/throughput), you can rent H200 at <50% utilization (rental dominates), you're frontier-training (B200 is the right pick), or you're a startup that should be on cloud rental until your inference economics justify cap-ex.

How it compares

  • vs H100 SXM (80 GB) → H200 is the same chip with 76% more memory + 43% more bandwidth. Pick H200 over H100 SXM for any new build; pick H100 SXM only if you're matching an existing H100 cluster or finding it at >25% discount. See /compare/nvidia-h200-vs-nvidia-h100-sxm.
  • vs B200 (192 GB) → B200 has more memory + bandwidth + native FP4 support, at higher cap-ex (~$40,000) and more demanding cooling. Pick B200 for frontier training and FP4 production; pick H200 for 90% of inference workloads where the cost gap doesn't pay for itself.
  • vs H100 NVL (188 GB) → H100 NVL is two H100s NVLinked in a single SKU at ~$60,000. H200 is one card at $31,000 with similar effective memory ceiling. Pick H200 for new builds; H100 NVL only makes sense in specific 188 GB single-SKU deployment slots.
  • vs L40S (48 GB) → L40S at $7,500 is roughly 1/4 the price and 1/3 the bandwidth (864 GB/s vs 4.8 TB/s). For 70B Q4 / 32B FP16 inference, L40S wins $/throughput. For frontier models or long context or training, H200 dominates.
  • vs renting on Runpod / Lambda / Together → H200 rents at ~$3–4.50/hr SXM, ~$2.50–3.50/hr PCIe. Cap-ex breakeven is ~7,000+ hours = 9 months 24×7. Most readers should rent H200 first and only buy when steady-state utilization > 70% sustains it for >12 months.
BLK · OVERVIEW

Overview

Hopper refresh — 141GB HBM3e at ~4.8 TB/s. Datacenter-class; rentable on RunPod, Lambda, etc.

Retailers we'd check:Amazon

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM141 GB
Power draw (peak)700 W
Released2024
MSRP$31000
Backends
CUDA

Models that fit

Open-weight models small enough to run on NVIDIA H200 with usable context.

all-MiniLM-L6-v2
0.022B · other
FLUX.1 [dev]
12B · other
Qwen 3 0.6B
0.6B · qwen
BGE Large EN v1.5
0.335B · other
Nomic Embed Text v1.5
0.137B · other
Kokoro 82M
0.082B · other
Llama 3.1 8B Instruct
8B · llama
Qwen 3 30B-A3B
30B · qwen

Frequently asked

What models can NVIDIA H200 run?

With 141GB VRAM, the NVIDIA H200 runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H200 support CUDA?

Yes — NVIDIA H200 is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

Where next?

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.

RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Compare alternatives

Hardware worth comparing

The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.

Closest matches
Similar price, bandwidth & form factor
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • AMD Instinct MI355X
    amd · 288 GB VRAM
    10.0/10
  • NVIDIA H100 SXM
    nvidia · 80 GB VRAM
    10.0/10
  • AMD Instinct MI300A (APU)
    amd · 128 GB VRAM
    10.0/10
Step up
More capable — more memory or a higher tier
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
Step down
Lighter — cheaper or more constrained
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
  • NVIDIA H100 SXM
    nvidia · 80 GB VRAM
    10.0/10