NVIDIA GeForce RTX 4070

Original 4070. 12GB Ada. Now eclipsed by 4070 Super at the same price.
Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.
Sub-scores sum to 509 / 1000. Headline = 509 × 0.70 (Estimated-confidence discount) = 356. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →
Extrapolated from 504 GB/s bandwidth — 60.5 tok/s estimated. No measured benchmarks yet.
Plain-English: Comfortable at 14B and below — snappy enough for a coding agent; vision models supported.
Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.
What it does well
The RTX 4070 (non-Super, non-Ti) is the entry-tier Ada-generation card and the cheapest path to "real Ada Tensor Cores + CUDA + 12 GB" for cost-conscious local AI buyers. 12 GB GDDR6X at 504 GB/s + Ada Tensor Cores (~117 TFLOPS FP16) at $599 MSRP / $400-500 used. Power draw at 200 W TDP is the most workstation-friendly 12 GB Ada card — fits in a 600 W PSU, runs cool, and is the easiest "drop-in upgrade" for older consumer builds. For 7B–13B class workloads it's genuinely strong: ~70–100 tok/s on Llama 3.1 8B, comfortable 13B Q5 with 32K context, smaller MoE models. Full CUDA stack: Ollama, LM Studio, llama.cpp, vLLM (single-card), ExLlamaV2. For developers whose primary local AI workload is sub-13B and who want a simple-to-deploy CUDA card at the entry tier, RTX 4070 is the right pick.
Where it breaks
- 12 GB ceiling kills serious local AI. Same hard ceiling as 4070 Super and 4070 Ti. Reader who wants 14B+ FP16 / 32B / 70B local AI should pick 16 GB+ (4070 Ti Super, 4080, 5070 Ti) or 24 GB+ (4090, 5090, used 3090).
- RTX 4070 Super is a strict upgrade. $599 MSRP for 4070 vs $599 MSRP for 4070 Super = identical price for ~15% more compute and same 12 GB VRAM. Always pick 4070 Super at MSRP. RTX 4070 only makes sense at meaningful used discount.
- Used RTX 3090 (24 GB) at $700 has 2× the VRAM. For pure AI use, 3090 wins decisively — it can run 70B Q4 / 32B FP16 workloads that 4070 cannot fit.
- Architecture is one generation behind Blackwell. RTX 5070 (12 GB) at $549 MSRP has FP4 native + slightly more bandwidth at lower price. Consumer Blackwell 12 GB is the architecture-current pick.
- Limited fine-tuning headroom. 12 GB barely fits 7B QLoRA with paged optimizer. Anything bigger needs more VRAM.
- Resale erosion. As Blackwell consumer ramp continues, used 4070 pricing should soften further over 12 months.
Ideal model range
- Sweet spot: 7B–13B FP16 inference at ~70–100 tok/s decode with 32K context.
- Sweet spot: Smaller MoE inference (sub-14B parameters active) — fits 12 GB with reasonable speed.
- Sweet spot: Multi-model agentic loops fitting 12 GB total — 4B + embedding + small classifier.
- Stretch: 14B Q4 with 8K context (just fits 12 GB tight, slow decode).
- Stretch: 7B QLoRA fine-tuning with paged optimizer.
- Bad fit: 32B-class anything, 70B-class anything, very long context on bigger models.
Bad use cases
- Anyone targeting 32B / 70B local AI. Hard 12 GB ceiling. Pick 16 GB+ minimum.
- Production multi-tenant serving. Consumer pick, not production.
- Anyone shopping at MSRP — pick 4070 Super instead. Identical price for 15% more compute.
- Cost-conscious 24 GB seekers. Used RTX 3090 wins by far at similar money.
- Long-horizon investment as primary AI card. Used pricing should drop further; buy for actual use.
- Anyone considering Blackwell-gen. RTX 5070 at $549 has FP4 + Blackwell at lower MSRP.
Verdict
Buy this if you find a used RTX 4070 at $400–$500, your local AI workload is firmly sub-13B (8B / 13B classes), you also game / do creator work where 4070 matters more than just for AI, you want CUDA + Ada-gen + low-power simple-to-deploy at consumer pricing, and you don't need 16 GB. RTX 4070 is the right pick for cost-conscious entry-level CUDA AI buyers.
Skip this if you can pay MSRP (4070 Super at $599 wins decisively), you want serious local AI (12 GB is below the practical floor for 14B+ models), used RTX 3090 at $700 fits your budget (24 GB at ~$200 more is far better $/AI-utility), or you want Blackwell-gen (RTX 5070 at $549 is architecture-current).
How it compares
- vs RTX 4070 Super (12 GB) → Same VRAM, same MSRP. 4070 Super has ~15% more compute. Strict upgrade at the same money. Don't buy 4070 new at MSRP.
- vs RTX 4070 Ti (12 GB) → Same VRAM. 4070 Ti has ~30% more compute at +$200 MSRP. Pick 4070 Ti only at deep used discount.
- vs RTX 5070 (12 GB) → Same VRAM tier, Ada-gen vs Blackwell. 5070 has FP4 native + slightly higher bandwidth at $549 MSRP (lower than 4070's $599). Pick 5070 for new builds; 4070 only at used discount.
- vs used RTX 3090 (24 GB) → Used 3090 at $700 has 2× the VRAM at ~+$200. For pure AI, 3090 wins by far on capability.
- vs RTX 3060 12GB → 3060 12GB has same VRAM tier + Ampere-gen at $329 MSRP. Half the price, same VRAM ceiling, slower compute and bandwidth (~360 GB/s vs 504 GB/s). Pick 3060 12GB for absolute budget; 4070 for ~50% faster decode.
Overview
Original 4070. 12GB Ada. Now eclipsed by 4070 Super at the same price.
Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.
Specs
| VRAM | 12 GB |
| Power draw (peak) | 200 W |
| Released | 2023 |
| MSRP | $599 |
| Backends | CUDA Vulkan |
Models that fit
Open-weight models small enough to run on NVIDIA GeForce RTX 4070 with usable context.
Frequently asked
What models can NVIDIA GeForce RTX 4070 run?
Does NVIDIA GeForce RTX 4070 support CUDA?
How much does NVIDIA GeForce RTX 4070 cost?
Where next?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.