NVIDIA H20 (96GB)
No editorial image yet — generic vendor mark shown. Credentials in spec table below.
The China-market Hopper SKU tuned for inference: 96GB HBM3 (more than the standard H100's 80GB), 4.0 TB/s, 400W, with ~41% fewer cores than a full H100. Export-compliant and highly relevant where H100/H200 are restricted.
Sub-scores sum to 996 / 1000. Headline = 996 × 0.70 (Estimated-confidence discount) = 697. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 4000 GB/s bandwidth — 480.0 tok/s estimated. No measured benchmarks yet.
Plain-English: Runs 70B comfortably — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it is
The H20 is NVIDIA's export-compliant Hopper part for the Chinese market — deliberately tuned for inference rather than training. It pairs cut-down compute (~41% fewer cores than a full H100) with an unusually large 96GB of HBM3 at 4.0 TB/s, which actually exceeds the standard H100's 80GB. At 400W it runs 30B FP16 or 70B quantized comfortably.
Relevance to local AI
For local/on-prem AI buyers in China — where H100/H200 are restricted — the H20 is often the most capable CUDA card legally available, and its 96GB makes it a genuinely strong inference GPU despite the compute cuts (inference is more memory- than compute-bound). The high VRAM-to-compute ratio is well-matched to serving, less so to training. Outside China it's largely irrelevant given access to full Hopper/Blackwell parts.
Bottom line
A niche-but-real entry: the export-compliant 96GB Hopper inference card that matters specifically to China-market local-AI deployments. Included for completeness; not relevant to buyers with access to standard H100/H200.
Overview
The China-market Hopper SKU tuned for inference: 96GB HBM3 (more than the standard H100's 80GB), 4.0 TB/s, 400W, with ~41% fewer cores than a full H100. Export-compliant and highly relevant where H100/H200 are restricted.
Search-fallback link — editorial hasn't yet curated a retailer URL for this card.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 96 GB |
| Power draw (peak) | 400 W |
| Released | 2024 |
| Backends | CUDA |
Models that fit
Open-weight models small enough to run on NVIDIA H20 (96GB) with usable context.
Frequently asked
What models can NVIDIA H20 (96GB) run?
Does NVIDIA H20 (96GB) support CUDA?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.