NVIDIA H20 (96GB) for local AI

NVIDIA H20 (96GB)

NVDA · HARDWARE

NVIDIA H20 (96GB)

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

The China-market Hopper SKU tuned for inference: 96GB HBM3 (more than the standard H100's 80GB), 4.0 TB/s, 400W, with ~41% fewer cores than a full H100. Export-compliant and highly relevant where H100/H200 are restricted.

Released 2024·4000 GB/s memory bandwidth

What it is

The H20 is NVIDIA's export-compliant Hopper part for the Chinese market — deliberately tuned for inference rather than training. It pairs cut-down compute (~41% fewer cores than a full H100) with an unusually large 96GB of HBM3 at 4.0 TB/s, which actually exceeds the standard H100's 80GB. At 400W it runs 30B FP16 or 70B quantized comfortably.

Relevance to local AI

For local/on-prem AI buyers in China — where H100/H200 are restricted — the H20 is often the most capable CUDA card legally available, and its 96GB makes it a genuinely strong inference GPU despite the compute cuts (inference is more memory- than compute-bound). The high VRAM-to-compute ratio is well-matched to serving, less so to training. Outside China it's largely irrelevant given access to full Hopper/Blackwell parts.

Bottom line

A niche-but-real entry: the export-compliant 96GB Hopper inference card that matters specifically to China-market local-AI deployments. Included for completeness; not relevant to buyers with access to standard H100/H200.

Frequently asked

What models can NVIDIA H20 (96GB) run?

With 96GB VRAM, the NVIDIA H20 (96GB) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H20 (96GB) support CUDA?

Yes — NVIDIA H20 (96GB) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

NVIDIA H20 (96GB)

NVDA · HARDWARE

NVIDIA H20 (96GB)

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

Released 2024·4000 GB/s memory bandwidth

What it is

Relevance to local AI

Frequently asked

What models can NVIDIA H20 (96GB) run?

With 96GB VRAM, the NVIDIA H20 (96GB) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does NVIDIA H20 (96GB) support CUDA?

Yes — NVIDIA H20 (96GB) is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

NVIDIA H20 (96GB)

Our verdict

What it is

Relevance to local AI

Bottom line

Overview

Specs

Models that fit

Frequently asked

What models can NVIDIA H20 (96GB) run?

Does NVIDIA H20 (96GB) support CUDA?

Where next?

NVIDIA H20 (96GB)

Our verdict

What it is

Relevance to local AI

Bottom line

Overview

Specs

Models that fit

Frequently asked

What models can NVIDIA H20 (96GB) run?

Does NVIDIA H20 (96GB) support CUDA?

Where next?

Hardware worth comparing

VRAM	96 GB
Power draw (peak)	400 W
Released	2024
Backends	CUDA

VRAM	96 GB
Power draw (peak)	400 W
Released	2024
Backends	CUDA