What can NVIDIA GB200 NVL72 run locally?

182 models fit comfortably on NVIDIA GB200 NVL72 with 32 GB system RAM. 0 more run with tradeoffs. Top pick: Llama 3.2 3B Instruct.

What's the recommended runner for this hardware?

llama.cpp / Ollama (CUDA). This is the runner that gives best tokens-per-second on NVIDIA GB200 NVL72 based on windows support and the GPU backend.

What can NVIDIA GB200 NVL72 run for chat?

Build: NVIDIA GB200 NVL72 + — + 32 GB RAM (windows)

Memory: 13824 GB VRAM + 32 GB system RAM

Runner: llama.cpp / Ollama (CUDA)

Any Chat Coding Agents Reasoning Vision Long context Creative

Runs comfortably
182 models

Ranked by fit for chat use case + predicted speed. Click a row for VRAM breakdown.

#1Llama 3.2 3B Instruct

llama

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 14.8 GBHeadroom: 13809.2 GB

ollama run llama3.2:3b

1631

tok/s

Weights

3.19 GB

KV cache

1.50 GB

Activations

8.35 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#2Mistral 7B Instruct v0.3

mistral

Commercial OK

Quant: Q5_K_MContext: 8,192VRAM: 18.5 GBHeadroom: 13805.5 GB

ollama run mistral:7b

1081

tok/s

Weights

4.81 GB

KV cache

3.50 GB

Activations

8.43 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#3Qwen 3 8B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 22.9 GBHeadroom: 13801.1 GB

ollama run qwen3:8b

612

tok/s

Weights

8.50 GB

KV cache

4.00 GB

Activations

8.62 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#4Gemma 2 9B Instruct

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 24.5 GBHeadroom: 13799.5 GB

ollama run gemma2:9b

544

tok/s

Weights

9.56 GB

KV cache

4.50 GB

Activations

8.67 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#5Mistral Nemo 12B Instruct

12B

mistral

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 29.4 GBHeadroom: 13794.6 GB

ollama run mistral-nemo:12b

408

tok/s

Weights

12.75 GB

KV cache

6.00 GB

Activations

8.83 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#6Gemma 3 12B

12B

gemma

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 29.4 GBHeadroom: 13794.6 GB

ollama run gemma3:12b

408

tok/s

Weights

12.75 GB

KV cache

6.00 GB

Activations

8.83 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#7Llama 3.1 8B Instruct

llama

Commercial OK

Quant: FP16Context: 8,192VRAM: 27.9 GBHeadroom: 13796.1 GB

ollama run llama3.1:8b

325

tok/s

Weights

16.00 GB

KV cache

1.07 GB

Activations

8.99 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#8Qwen 3 14B

14B

qwen

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 32.6 GBHeadroom: 13791.4 GB

ollama run qwen3:14b

350

tok/s

Weights

14.88 GB

KV cache

7.00 GB

Activations

8.94 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#9Gemma 4 26B MoE

26B

gemma

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 39.5 GBHeadroom: 13784.5 GB

ollama run gemma4:26b-moe

331

tok/s

Weights

15.70 GB

KV cache

13.00 GB

Activations

8.98 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#10OLMo 2 32B

32B

other

Commercial OK

Quant: Q4_K_MContext: 8,192VRAM: 46.3 GBHeadroom: 13777.7 GB

269

tok/s

Weights

19.32 GB

KV cache

16.00 GB

Activations

9.16 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#11Mistral Medium 3.5 (675B MoE)

675B

mistral

Quant: Q4_K_MContext: 8,192VRAM: 775.4 GBHeadroom: 13048.6 GB

210

tok/s

Weights

407.53 GB

KV cache

337.50 GB

Activations

28.57 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

#12Mistral Small 3 24B

24B

mistral

Commercial OK

Quant: Q8_0Context: 8,192VRAM: 48.8 GBHeadroom: 13775.2 GB

ollama run mistral-small:24b

204

tok/s

Weights

25.50 GB

KV cache

12.00 GB

Activations

9.47 GB

Runtime

1.80 GB

Model details →Run-on benchmark page →

What if you upgraded?

Hypothetical scenarios. We re-ran the compatibility engine for each.

+32 GB system RAM

~$80–150

Doubles your CPU-offload working set. Helps when models don't quite fit in VRAM.

Unlocks: 1 new comfortable

• SmolLM 2 360M Instruct

Shop this upgrade↗

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

Won't run
top 3 popular models

Need more memory than you have. Shown for orientation.

Qwen 3.6 35B-A3B (MTP)

35B

qwen

Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—

Qwen 3.6 27B (MTP)

27B

qwen

Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—

Ring-2.6-1T

1000B

other

Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—

How to read these numbers

Measured — we ran this exact combo on owner hardware.

Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email support@runlocalai.co and we'll prioritize it.

Runs comfortably182 models

What if you upgraded?

+32 GB system RAM

Won't runtop 3 popular models

How to read these numbers

Runs comfortably
182 models

Won't run
top 3 popular models