Used RTX 3090 vs new RTX 5080 for local AI in 2026
24 GB Ampere from the used market; price-per-VRAM king.
- VRAM
- 24 GB
- Bandwidth
- 936 GB/s
- TDP
- 350 W
- Price
- $700-1,000 (2026 used; inspect for mining wear)
16 GB GDDR7 Blackwell; the second-tier 2026 consumer card.
- VRAM
- 16 GB
- Bandwidth
- 960 GB/s
- TDP
- 360 W
- Price
- $1,000-1,300 (2026 retail; supply variable)
Same buyer, two paths. The used 3090 trades a fresh warranty for 24 GB VRAM at half the price; the new 5080 trades 8 GB of VRAM for Blackwell silicon, FP4 support, and a clean MSRP. For local LLM buyers in 2026 this is the most-asked question in r/LocalLLaMA.
VRAM ceiling decides the workload. The 3090's 24 GB fits 70B Q4 with tight context; the 5080's 16 GB does not. The 5080 wins on bandwidth (960 vs 936 GB/s — close), efficiency, and FP4 inference paths in TensorRT-LLM and vLLM nightly. Software wins go to the 5080; raw VRAM wins go to the 3090.
Used-market risk is real. A 2020-2021 3090 has 4-5 years on it, often with mining or 24/7 LLM duty. Inspect fans, repaste candidates, check thermal pad health. The 5080 is new silicon with retailer warranty.
Resale economics swing the other way. A used 3090 holds value because the 24 GB tier is rare in the used market; a 5080 depreciates harder once 60-series Blackwell lands.
Quick decision rules
Operational matrix
| Dimension | Used RTX 3090 24 GB Ampere from the used market; price-per-VRAM king. | RTX 5080 16 GB GDDR7 Blackwell; the second-tier 2026 consumer card. |
|---|---|---|
VRAM ceiling Largest model that fits without offload. | Strong 24 GB. 70B Q4 fits at 8K context; 32B FP16 fits with headroom. | Limited 16 GB. 70B impossible; 32B FP16 forces offload; 22-24B Q4 fits. |
Memory bandwidth Decode throughput on memory-bound regimes. | Strong 936 GB/s GDDR6X. Mature, reliable; ages well. | Strong 960 GB/s GDDR7. Effectively tied within margin of error for decode. |
Compute (FP16 / FP8) Prefill + matmul throughput. | Acceptable ~71 TFLOPS FP16. No FP8 path. Older Ampere tensor cores. | Excellent ~56 TFLOPS FP16, ~112 TFLOPS FP8, FP4 in 2026 runtimes. Decisive on prefill. |
Software ecosystem (2026) Day-zero new model + new runtime support. | Excellent 5-year-old Ampere; rock-solid in every runtime including older CUDA. | Strong Blackwell support is mature in 2026 but bleeding-edge kernels still trail Hopper/Ada by weeks. |
Reliability + warranty First-year failure expectation + recourse. | Limited Used card; no warranty unless seller offers. Mining + 24/7 LLM duty common. | Excellent Retailer warranty intact. New silicon + low first-year failure rate. |
Power + cooling TDP + thermal envelope. | Limited 350W TDP; older cooling solutions; expect repaste candidates. | Strong 360W TDP. Newer cooling; quieter under sustained inference. |
Price (2026) Realistic acquisition cost. | Excellent $700-1,000 used. Best $/GB-VRAM in the used market. | Acceptable $1,000-1,300 retail. ~$300-500 premium over a used 3090. |
Resale value (3 yr) Predicted % of acquisition price held. | Strong 24 GB tier holds value; rare-VRAM premium props the floor. | Acceptable 60-series Blackwell lands; mid-tier depreciation is steeper than flagships. |
Tiers are qualitative editorial labels, not derived from a single benchmark. For tok/s and VRAM measurements on these cards, browse the corpus or request a benchmark.
Who should AVOID each option
Avoid the Used RTX 3090
- If you need warranty + new silicon
- If FP4 inference matters to your stack
- If you don't have a PSU + thermal headroom for a 350W used card
Avoid the RTX 5080
- If 70B Q4 is the daily target
- If 16 GB ceiling will force offload on your common workloads
- If 24 GB used at $800 is in your local market
Workload fit
Used RTX 3090 fits
- 70B Q4 single card
- Multi-GPU homelab (paired)
- Used-market value buyer
RTX 5080 fits
- 13B-32B daily use
- FP4 / Blackwell features
- Warranty-required deployments
Where to buy
Where to buy Used RTX 3090
Editorial price range: $700-1,000 (2026 used; inspect for mining wear)
Where to buy RTX 5080
Editorial price range: $1,000-1,300 (2026 retail; supply variable)
Affiliate links — no extra cost. Prices are editorial ranges, not real-time. Click through to verify.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Editorial verdict
If your target is 70B Q4 and you can stomach used-market risk, the 3090 is the right answer. 24 GB at $800 dollars is unmatched in 2026. Plan for repaste, fan inspection, and a 750W+ PSU.
If your target is 13B-32B and you'd rather have warranty + FP4 + clean silicon, the 5080 is the better buy. The 8 GB VRAM gap closes when FP4-quantized 70B lands more broadly in 2027 — though that's a forward bet.
Don't underrate 'I want it to just work.' A used 3090 is a known-quantity AI card with documented quirks. A new 5080 is a quiet, warranted, upgradeable starting point. Match the card to your tolerance for ops time.
HonestyWhy benchmark numbers on this page might not reflect your real experience
- tok/s is not user experience. Humans read at ~10-15 tok/s — anything above that is buffer time, not perceived speed.
- Context length changes everything. A 70B Q4 model at 1024 tokens generates ~25 tok/s; the same model at 32K context drops to ~8-12 tok/s as KV cache fills.
- Quantization changes the conclusion. Q4_K_M vs Q5_K_M vs Q8 produce different speed AND different quality. A benchmark at one quant doesn't translate to another.
- Thermal throttling changes long sessions. The first 15 minutes of a benchmark see boost-clock peak; the next 4 hours see steady-state, which is 5-15% slower depending on case airflow.
- Driver and runtime versions silently shift winners. A 2024 benchmark on PyTorch 2.4 + CUDA 12.4 doesn't reflect 2026 reality on PyTorch 2.6 + CUDA 12.6. Discount benchmarks older than 6 months.
- Vendor and YouTuber benchmarks are cherry-picked. The standard 'Llama 3.1 70B Q4 at 1024 tokens' chart shows peak decode on a tiny prompt — exactly the conditions least representative of daily use.
- A 25-30% throughput gap between two cards rarely translates to a 25-30% experience gap. Both cards are fast enough; the differentiator is usually VRAM ceiling, not raw decode speed.
We try to surface these caveats where they apply. If a number on this page reads more confident than it should, please email us via contact. See also our methodology and editorial philosophy.
Don't see your specific workload?
The matrix above is editorial. If you want a measured tok/s number for a specific model + quant on either card, file a benchmark request — the community claims requests and reproduces them under our methodology checklist.