NVIDIA GeForce RTX 4080

Original 4080. 16GB GDDR6X. Still capable for 14B–32B Q4 work.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 612 / 1000. Headline = 612 × 0.70 (Estimated-confidence discount) = 428. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 717 GB/s bandwidth — 86.0 tok/s estimated. No measured benchmarks yet.
Plain-English: Comfortable at 14B and below — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The RTX 4080 hits the sweet spot for "I want to run real local AI on consumer hardware without paying 4090 prices." 16 GB GDDR6X at 716 GB/s comfortably runs Llama 3.3 8B at 80–100 tok/s, Qwen 3 30B-A3B (the MoE) at ~60–80 tok/s, or 13B Q5 at ~50–70 tok/s with full 32K context. Ada-generation tensor compute (388 TFLOPS FP16) means you're not constrained on math — for any model that fits 16 GB, decode is plenty fast for interactive use. Full CUDA stack: Ollama, LM Studio, llama.cpp, vLLM (single-card), ExLlamaV2 all run beautifully. 320 W TDP is workstation-friendly with a quality 850 W+ PSU. The card has settled at $700–$900 used with strong availability — better $/throughput than buying a new RTX 5070 Ti at $750 if you find the right used 4080. For developers who don't need 24+ GB and want a CUDA card that "just works" for everything from local coding to small fine-tuning, 4080 is a smart spot.
Where it breaks
- 16 GB is the floor on serious models. 32B FP16 doesn't fit. 70B Q4 doesn't fit (needs ~40 GB). The 16 GB ceiling forces you to either pick smaller models or use partial-offload + RAM (which slows decode dramatically). Any reader Googling "can the RTX 4080 run 70B" should be told the honest answer: no, not at decent speed.
- No second-gen Transformer Engine. Ada has FP8 but not the Hopper / Blackwell-specific optimizations. For modern frameworks tuned to FP8 throughput, RTX 5090 or RTX 5080 wins on architecture-specific gains.
- Power draw is real. 320 W TDP under load is meaningfully more than RTX 3090 (350 W but generally less peak demand) or RTX 4070 Ti Super (285 W). Cooling needs to be thoughtful.
- The used market for 4080 is awkward. Pricing has settled but availability is spotty — many sellers are pricing 4080 close to 4080 Super (which is actually the better buy at the same money). Read SKU carefully.
- Resale is uncertain over a 3+ year horizon. As RTX 5080 ramps and 5060 Ti 16 GB / 5070 12 GB land at retail, used 4080 pricing should drop. If you're buying, hold it for actual use, not as an investment.
Ideal model range
- Sweet spot: 8B–14B class at FP16/Q8 with 32K–128K context — full speed (~60–120 tok/s decode), comfortable headroom.
- Sweet spot: 30B-class MoE (Qwen 3 30B-A3B, smaller mixture-of-experts) — fits 16 GB at Q4–Q5 with reasonable speed.
- Sweet spot: Small-model agentic loops — fit a 7B + 4B + embedding model simultaneously.
- Stretch: 32B Q4 with 8K context (just barely fits 16 GB; expect 25–35 tok/s).
- Stretch: Local fine-tuning at 7B QLoRA with paged optimizer.
- Bad fit: 70B-class anything. Don't try; pick a card with more VRAM.
Bad use cases
- 70B-class workloads. Hard 16 GB ceiling. Use RTX 4090 (24 GB), RTX 5090 (32 GB), used 3090 (24 GB) at minimum, or step up to workstation tier.
- Production multi-tenant serving. This is a single-user / single-card consumer pick. Use L40S for production rack inference.
- Anyone bidding "best $/VRAM" used. A used RTX 3090 at $700–$1,000 has 24 GB vs 4080's 16 GB at similar money. 3090 wins for pure VRAM-per-dollar.
- Long-horizon investment as a primary card. With 5080 / 5070 Ti out, 4080's resale will erode. Buy for use, not as a hold.
Verdict
Buy this if you find a used 4080 at $700–$900, you're running 8B–30B-class models for local development / coding / agentic loops, you don't need 24+ GB ceiling, and you want CUDA + Ada-gen tensor compute + low-friction local AI. The 4080 hits the right midpoint between "real CUDA" and "consumer pricing" for the reader who's serious but not paying 4090/5090 money.
Skip this if you're targeting 70B-class models (need 4090 or 5090 or used 3090 for 24 GB), you can find an RTX 4080 Super at similar money (it's the strict upgrade with same VRAM and more compute), you want long-context (32K+ on bigger models), or you're cost-sensitive and a used 3090 fits the workload.
How it compares
- vs RTX 4080 Super (16 GB) → 4080 Super has same 16 GB but ~6% more compute and slightly higher bandwidth. At similar used prices, 4080 Super is the strict upgrade. Don't pay more than $50–100 less for 4080 over 4080 Super; pick Super if money's similar.
- vs RTX 4090 (24 GB) → 4090 has 50% more VRAM, ~40% more bandwidth, and dramatically more compute, at ~2× the price. Pick 4080 for 8B–30B; pick 4090 for 70B-class and everything bigger.
- vs RTX 5080 (16 GB) → Same VRAM tier, Blackwell-gen vs Ada-gen. 5080 wins on architecture (FP4 native, second-gen Transformer Engine), modest bandwidth advantage. At similar prices new, pick 5080. At significantly cheaper used, 4080 still works.
- vs RTX 3090 (24 GB) → 3090 has 50% more VRAM at similar used price. 4080 has ~50% more compute, FP8 native, lower power. Pick 3090 for VRAM-bound workloads (70B Q4 fits 24 GB); pick 4080 for 16 GB-or-less workloads where compute speed matters more.
- vs RTX 4070 Ti Super (16 GB) → Same VRAM, ~80% the compute of 4080. 4070 Ti Super is ~$100–$200 cheaper used. Pick 4070 Ti Super for budget; 4080 for slightly higher compute on the same VRAM tier.
Overview
Original 4080. 16GB GDDR6X. Still capable for 14B–32B Q4 work.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 16 GB |
| Power draw (peak) | 320 W |
| Released | 2022 |
| MSRP | $1199 |
| Backends | CUDA Vulkan |
Models that fit
Open-weight models small enough to run on NVIDIA GeForce RTX 4080 with usable context.
Frequently asked
What models can NVIDIA GeForce RTX 4080 run?
Does NVIDIA GeForce RTX 4080 support CUDA?
How much does NVIDIA GeForce RTX 4080 cost?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.