What can NVIDIA GeForce RTX 5080 run?
Build: RTX 5080 + Ryzen 9 9950X + 32GB DDR5
Runs comfortably10 models
Full-VRAM resident, with room for context. No compromises.
Quant: Q4_K_MContext: 8,192VRAM: 11.1 GBHeadroom: 4.9 GBTTFT: instantollama run gemma3:1b1034tok/sE
ollama run gemma3:1bQuant: Q8_0Context: 8,192VRAM: 11.6 GBHeadroom: 4.4 GBTTFT: instantollama run llama3.2:1b587tok/sE
ollama run llama3.2:1bQuant: Q4_K_MContext: 2,048VRAM: 9.2 GBHeadroom: 6.8 GBTTFT: fastollama run deepseek-r1:7b148tok/sE
ollama run deepseek-r1:7bQuant: Q4_K_MContext: 2,048VRAM: 9.2 GBHeadroom: 6.8 GBTTFT: fastollama run llama3.1:8b129tok/sE
ollama run llama3.1:8bQuant: Q4_K_MContext: 2,048VRAM: 9.9 GBHeadroom: 6.1 GBTTFT: fastollama run qwen3:8b129tok/sE
ollama run qwen3:8bQuant: Q4_K_MContext: 2,048VRAM: 9.2 GBHeadroom: 6.8 GBTTFT: fastollama run mistral:7b148tok/sE
ollama run mistral:7bQuant: Q4_K_MContext: 2,048VRAM: 9.9 GBHeadroom: 6.1 GBTTFT: fastollama run hermes3:8b129tok/sE
ollama run hermes3:8bQuant: Q4_K_MContext: 2,048VRAM: 9.9 GBHeadroom: 6.1 GBTTFT: fast129tok/sE
Quant: Q4_K_MContext: 2,048VRAM: 9.2 GBHeadroom: 6.8 GBTTFT: fastollama run codegemma:7b148tok/sE
ollama run codegemma:7bQuant: Q4_K_MContext: 2,048VRAM: 10.7 GBHeadroom: 5.3 GBTTFT: fastollama run gemma2:9b115tok/sE
ollama run gemma2:9bRuns with tradeoffs35 models
Tight VRAM, partial CPU offload, or context-limited.
Quant: Q4_K_MContext: 2,048VRAM: 26.6 GBHeadroom: 8.6 GBTTFT: noticeable- • Partial CPU offload: ~40% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen3:30b5tok/sE
- • Partial CPU offload: ~40% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen3:30bQuant: Q4_K_MContext: 8,192VRAM: 32.4 GBHeadroom: 2.8 GBTTFT: noticeable- • Partial CPU offload: ~51% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen2.5-coder:32b3tok/sE
- • Partial CPU offload: ~51% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen2.5-coder:32bQuant: Q4_K_MContext: 2,048VRAM: 28.1 GBHeadroom: 7.1 GBTTFT: noticeable- • Partial CPU offload: ~43% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen3:32b4tok/sE
- • Partial CPU offload: ~43% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run qwen3:32bQuant: Q4_K_MContext: 2,048VRAM: 27.4 GBHeadroom: 7.8 GBTTFT: noticeable- • Partial CPU offload: ~42% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run gemma4:31b4tok/sE
- • Partial CPU offload: ~42% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run gemma4:31bQuant: Q4_K_MContext: 2,048VRAM: 28.1 GBHeadroom: 7.1 GBTTFT: noticeable- • Partial CPU offload: ~43% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run deepseek-r1:32b4tok/sE
- • Partial CPU offload: ~43% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run deepseek-r1:32bQuant: Q4_K_MContext: 2,048VRAM: 23.6 GBHeadroom: 11.6 GBTTFT: noticeable- • Partial CPU offload: ~32% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run gemma4:26b-moe6tok/sE
- • Partial CPU offload: ~32% of layers run on CPU
- • CPU is the bottleneck — upgrading RAM bandwidth helps more than VRAM here
ollama run gemma4:26b-moeQuant: Q8_0Context: 8,192VRAM: 14.8 GBHeadroom: 1.2 GBTTFT: fast- • Tight VRAM fit — only 1.2 GB headroom left for context growth
ollama run llama3.2:3b196tok/sE
- • Tight VRAM fit — only 1.2 GB headroom left for context growth
ollama run llama3.2:3bQuant: Q4_K_MContext: 2,048VRAM: 14.5 GBHeadroom: 1.5 GBTTFT: noticeable- • Tight VRAM fit — only 1.5 GB headroom left for context growth
ollama run qwen3:14b74tok/sE
- • Tight VRAM fit — only 1.5 GB headroom left for context growth
ollama run qwen3:14bWhat if you upgraded?
Hypothetical scenarios. We re-ran the compatibility engine for each.
+32 GB system RAM
~$80–150
Doubles your CPU-offload working set. Helps when models don't quite fit in VRAM.
Unlocks: 37 new tradeoff
- • Qwen 3 30B-A3B
- • Qwen 2.5 Coder 32B Instruct
- • Llama 3.3 70B Instruct
- • Qwen 3 32B
Upgrade to NVIDIA GeForce RTX 3090
~$899
24 GB VRAM (vs your 16 GB) plus a bandwidth jump from ~960 GB/s to ~? GB/s.
Unlocks: 14 new comfortable
- • Gemma 4 E2B (Effective 2B)
- • Llama 3.2 3B Instruct
- • Phi-3.5 Vision
- • Phi-3.5 Mini Instruct
Add a second NVIDIA GeForce RTX 5080
~$1199
Tensor parallelism splits the model across both cards, effectively doubling VRAM. Bandwidth doesn't double — runs ~1.5× the single-card speed in practice.
Unlocks: 24 new comfortable
- • Gemma 4 E2B (Effective 2B)
- • Llama 3.2 3B Instruct
- • Phi-3.5 Vision
- • Phi-3.5 Mini Instruct
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Won't runtop 5 popular models
Need more memory than you have. Shown for orientation.
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
—
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
—
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
—
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
—
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
—
Even with CPU offload, needs more memory than your VRAM (16 GB) + 60% of system RAM (19 GB) combined.
How to read these numbers
Want a specific benchmark we don't have? Email benchmarks@runlocalai.co and we'll prioritize it.