DeepSeek V3 (671B MoE)
DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.
DeepSeek V3 is the open-weight frontier in 2024–2025 — 671B total parameters, 37B active per token through a clever MoE design. It's the model closed-AI competitors should be most worried about. The catch: local-running it requires either workstation hardware or cloud GPU rentals.
Strengths- Frontier-class quality — genuinely competes with GPT-4o on many benchmarks.
- 37B active per token keeps compute reasonable despite the 671B nameplate.
- MIT-style permissive license — cleanest license at this capability tier.
- 671B total means disk + memory footprint is workstation-scale (~380 GB at Q4).
- Routing isn't free — at low quants, MoE quality degrades faster than dense models.
- Tool use less polished than Llama family at the time of writing.
- Q4_K_M (~380 GB) — not realistically runnable on a single 4090
- Practical local hardware: dual A100 80 GB (160 GB), Mac Studio M3 Ultra 192 GB, or H100 cluster
- Single 4090 + 192 GB DDR5 with Q3: ~1–3 tok/s, not productive
Yes, for workstation owners (A100/H100 multi-card, M3 Ultra), or via cloud GPU rental for short sessions. No, for consumer-card users — even with massive system RAM, the bandwidth ceiling makes this impractical on a 4090.
How it compares- vs Llama 4 Maverick → similar tier; V3 has the edge on math/code, Maverick wins on multimodality.
- vs DeepSeek R1 → R1 is reasoning-trained; V3 is the better generalist. Pick by workload.
- vs Qwen 3 235B-A22B → Qwen is the closer-sized peer; V3 wins on raw quality, Qwen wins on accessibility (smaller total params).
- vs Mixtral 8x22B → V3 dramatically outclasses Mixtral 8x22B on quality at similar VRAM.
# Workstation example (4× A100 80 GB)
ollama pull deepseek-v3:q4_K_M
ollama run deepseek-v3:q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, multi-GPU offload, A100 cluster or M3 Ultra
›Why this rating
9.0/10 — the open-weight model that genuinely competes with closed frontier models on many benchmarks. The 671B / 37B-active MoE design is brilliant, but the practical reality is that local-running it requires workstation hardware. Loses fractional points only on accessibility.
Overview
DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.
Strengths
- GPT-4-class quality
- MoE efficiency
- Open weights
Weaknesses
- Server-only on consumer hardware
- Permissive license but with terms
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 380.0 GB | 420 GB |
Get the model
Ollama
One-line install
ollama run deepseek-v3:671bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DeepSeek V3 (671B MoE).
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DeepSeek V3 (671B MoE)?
Can I use DeepSeek V3 (671B MoE) commercially?
What's the context length of DeepSeek V3 (671B MoE)?
How do I install DeepSeek V3 (671B MoE) with Ollama?
Source: huggingface.co/deepseek-ai/DeepSeek-V3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.