DeepSeek V3 (671B MoE)

Positioning

DeepSeek V3 is the open-weight frontier in 2024–2025 — 671B total parameters, 37B active per token through a clever MoE design. It's the model closed-AI competitors should be most worried about. The catch: local-running it requires either workstation hardware or cloud GPU rentals.

Strengths

Frontier-class quality — genuinely competes with GPT-4o on many benchmarks.
37B active per token keeps compute reasonable despite the 671B nameplate.
MIT-style permissive license — cleanest license at this capability tier.

Limitations

671B total means disk + memory footprint is workstation-scale (~380 GB at Q4).
Routing isn't free — at low quants, MoE quality degrades faster than dense models.
Tool use less polished than Llama family at the time of writing.

Real-world performance on RTX 4090

Q4_K_M (~380 GB) — not realistically runnable on a single 4090
Practical local hardware: dual A100 80 GB (160 GB), Mac Studio M3 Ultra 192 GB, or H100 cluster
Single 4090 + 192 GB DDR5 with Q3: ~1–3 tok/s, not productive

Should you run this locally?

Yes, for workstation owners (A100/H100 multi-card, M3 Ultra), or via cloud GPU rental for short sessions. No, for consumer-card users — even with massive system RAM, the bandwidth ceiling makes this impractical on a 4090.

How it compares

vs Llama 4 Maverick → similar tier; V3 has the edge on math/code, Maverick wins on multimodality.
vs DeepSeek R1 → R1 is reasoning-trained; V3 is the better generalist. Pick by workload.
vs Qwen 3 235B-A22B → Qwen is the closer-sized peer; V3 wins on raw quality, Qwen wins on accessibility (smaller total params).
vs Mixtral 8x22B → V3 dramatically outclasses Mixtral 8x22B on quality at similar VRAM.

Run this yourself

# Workstation example (4× A100 80 GB)
ollama pull deepseek-v3:q4_K_M
ollama run deepseek-v3:q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, multi-GPU offload, A100 cluster or M3 Ultra

Quantization	File size	VRAM required
Q4_K_M	380.0 GB	420 GB

Quantization

File size

VRAM required

Q4_K_M

380.0 GB

420 GB

Frequently asked

What's the minimum VRAM to run DeepSeek V3 (671B MoE)?

420GB of VRAM is enough to run DeepSeek V3 (671B MoE) at the Q4_K_M quantization (file size 380.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek V3 (671B MoE) commercially?

Yes — DeepSeek V3 (671B MoE) ships under the DeepSeek License, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek V3 (671B MoE)?

DeepSeek V3 (671B MoE) supports a context window of 65,536 tokens (about 66K).

How do I install DeepSeek V3 (671B MoE) with Ollama?

Run `ollama pull deepseek-v3:671b` to download, then `ollama run deepseek-v3:671b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run DeepSeek V3 (671B MoE)?

Can I use DeepSeek V3 (671B MoE) commercially?

What's the context length of DeepSeek V3 (671B MoE)?

How do I install DeepSeek V3 (671B MoE) with Ollama?