RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Quick answers
REF
  • All buyer guides
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Will it run? / NVIDIA GB200 NVL72

What can NVIDIA GB200 NVL72 run?

Build: NVIDIA GB200 NVL72 + — + 32 GB RAM (windows)

Memory: 13824 GB VRAM + 32 GB system RAM
Runner: llama.cpp / Ollama (CUDA)
AnyChatCodingAgentsReasoningVisionLong contextCreative

Runs comfortably
183 models

Full-VRAM resident, with room for context. No compromises.

#1Gemma 3 1B
1B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 11.1 GBHeadroom: 13812.9 GB
ollama run gemma3:1b
8613
tok/s
E
Weights
0.60 GB
KV cache
0.50 GB
Activations
8.22 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#2Llama 3.2 1B Instruct
1B
llama
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 11.6 GBHeadroom: 13812.4 GB
ollama run llama3.2:1b
4894
tok/s
E
Weights
1.06 GB
KV cache
0.50 GB
Activations
8.25 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#3Gemma 4 E2B (Effective 2B)
2B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 13.2 GBHeadroom: 13810.8 GB
ollama run gemma4:e2b
2447
tok/s
E
Weights
2.13 GB
KV cache
1.00 GB
Activations
8.30 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#4Llama 3.2 3B Instruct
3B
llama
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 14.8 GBHeadroom: 13809.2 GB
ollama run llama3.2:3b
1631
tok/s
E
Weights
3.19 GB
KV cache
1.50 GB
Activations
8.35 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#5Phi-3.5 Vision
4.2B
phi
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 14.8 GBHeadroom: 13809.2 GB
2051
tok/s
E
Weights
2.54 GB
KV cache
2.10 GB
Activations
8.32 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#6Phi-3.5 Mini Instruct
3.8B
phi
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 16.1 GBHeadroom: 13807.9 GB
ollama run phi3.5:3.8b
1288
tok/s
E
Weights
4.04 GB
KV cache
1.90 GB
Activations
8.39 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#7Gemma 4 E4B (Effective 4B)
4B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 16.5 GBHeadroom: 13807.5 GB
ollama run gemma4:e4b
1224
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#8Qwen 3 4B
4B
qwen
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 16.5 GBHeadroom: 13807.5 GB
ollama run qwen3:4b
1224
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#9Gemma 3 4B
4B
gemma
Commercial OK
Quant: Q8_0Context: 8,192VRAM: 16.5 GBHeadroom: 13807.5 GB
ollama run gemma3:4b
1224
tok/s
E
Weights
4.25 GB
KV cache
2.00 GB
Activations
8.40 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#10Llama 3.1 Nemotron Nano 8B
8B
llama
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 19.1 GBHeadroom: 13804.9 GB
1077
tok/s
E
Weights
4.83 GB
KV cache
4.00 GB
Activations
8.43 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#11Mistral 7B Instruct v0.3
7B
mistral
Commercial OK
Quant: Q5_K_MContext: 8,192VRAM: 18.5 GBHeadroom: 13805.5 GB
ollama run mistral:7b
1081
tok/s
E
Weights
4.81 GB
KV cache
3.50 GB
Activations
8.43 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →
#12CodeGemma 7B
7B
gemma
Commercial OK
Quant: Q4_K_MContext: 8,192VRAM: 17.9 GBHeadroom: 13806.1 GB
ollama run codegemma:7b
1230
tok/s
E
Weights
4.23 GB
KV cache
3.50 GB
Activations
8.40 GB
Runtime
1.80 GB
Model details →Run-on benchmark page →

Won't run
top 3 popular models

Need more memory than you have. Shown for orientation.

Qwen 3.6 35B-A3B (MTP)
35B
qwen
Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—
Qwen 3.6 27B (MTP)
27B
qwen
Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—
Ring-2.6-1T
1000B
other
Commercial OK

Even with CPU offload, needs more memory than your VRAM (13824 GB) + 60% of system RAM (19 GB) combined.

—

How to read these numbers

M
Measured — we ran this exact combo on owner hardware.

~
Extrapolated — predicted from a measured benchmark on similar-bandwidth hardware.

E
Estimated — pure formula based on VRAM bandwidth and model architecture.

Full methodology →

Want a specific benchmark we don't have? Email support@runlocalai.co and we'll prioritize it.