DeepSeek V4
DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source flagships on reasoning.
Positioning
DeepSeek V4 is DeepSeek's frontier reasoning model and the architectural successor to DeepSeek V3. The model is a Mixture-of-Experts at trillion-parameter total scale with a smaller activated parameter count per token (~37-50B active). Released under DeepSeek's permissive open-weight license, V4 represents the current state of open-weight frontier reasoning capability — in the same conversation as Claude 3.7 Sonnet and GPT-5 mini on math, code, and reasoning per DeepSeek's release benchmarks, and multi-step reasoning benchmarks.
Strengths
- Frontier reasoning capability with permissive license. Open-weight + apache-style usage rights make V4 a genuine commercial-deployment alternative to closed-source frontier APIs.
- MoE efficiency. Active parameter count is ~10× smaller than total — inference cost economics are dramatically better than dense 405B-class models at similar capability.
- Long context handling. 128K context with stable degradation curve, similar to V3.
- Math + code performance. Strong on AIME-level math, competitive programming, and multi-step code reasoning. The frontier-tier reasoning capability landed for open-weight in 2026.
- Multilingual coverage. Continues V3's tradition of strong Chinese + English coverage with reasonable support for major Western European languages.
Limitations
- Compute requirements are massive. V4 at FP16 requires multi-card cluster — single-card deployment requires aggressive quantization (Q3-Q4) and even then needs frontier hardware (MI300X, Mac Studio M3 Ultra, or 2× H100 PCIe NVL).
- MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing optimizations. Not all serving stacks handle MoE efficiently.
- Latency on single-token generation can be uneven as different tokens activate different experts.
- Tool-use is mature but not class-leading. Anthropic Claude and OpenAI's flagships have more mature tool-use chain-of-thought than DeepSeek V4 in mid-2026.
- Censorship and refusal behavior follows DeepSeek's RLHF policy — different from Western frontier models on certain political/social topics.
Real-world performance
- vs DeepSeek V3: V4 is the strict reasoning-capability upgrade with similar serving costs (MoE total params similar tier).
- vs Claude 3.7 Sonnet (API): Closer than V3 was to Claude. V4 wins on cost (open-weight self-hosted) for sustained workloads; Claude API wins on tool-use polish + reasoning trace quality.
- vs Qwen 3 235B-A22B: V4 stronger on reasoning; Qwen 3 stronger on multilingual coverage and competitive programming optimization.
- vs Llama 3.3 70B: V4 dramatically more capable but also dramatically more expensive to serve. Pick by capability tier needed.
Should you run this locally?
Yes if you operate frontier-tier infrastructure (MI300X cluster / H100 SXM cluster / 8× DGX) and need open-weight frontier reasoning + commercial licensing. The cost economics of self-hosted V4 vs Claude/GPT-5 API can favor self-hosting at scale.
No if you don't have frontier hardware (rent on cloud / use API), your workload is closer to Llama 3.3 70B capability tier (cheaper to serve), or you need maximum tool-use polish (frontier closed-source APIs win).
How it compares
- vs DeepSeek V3: V4 is reasoning-capability upgrade with similar MoE structure.
- vs DeepSeek V3 Lite: V3 Lite is the smaller-MoE variant for cheaper serving.
- vs Qwen 3 235B-A22B: Different MoE design. V4 wins reasoning; Qwen 3 wins multilingual.
- vs Anthropic Claude 3.7 Sonnet: V4 wins on cost at scale; Claude wins on tool-use polish.
Run this yourself
- Single-card workstation: Mac Studio M3 Ultra (192 GB) at Q3 — slowest but functional path.
- Single-card AMD: MI300X (192 GB) at Q3-Q4.
- Datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
- Frontier: 8× B200 SXM cluster for production multi-tenant serving.
- Cloud rental: Runpod / Lambda B200 cluster ~$30-60/hr per node, or H100 SXM cluster ~$25-40/hr.
Overview
DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source flagships on reasoning.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Open-weight benchmark leader (May 2026)
- 38B active params keep inference practical
- Strong on coding + math
Weaknesses
- 745B at any quant requires multi-node cluster
- Not single-machine deployable
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 425.0 GB | 480 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DeepSeek V4.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DeepSeek V4?
Can I use DeepSeek V4 commercially?
What's the context length of DeepSeek V4?
Source: huggingface.co/deepseek-ai/DeepSeek-V4
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify DeepSeek V4 runs on your specific hardware before committing money.