DeepSeek V4

Positioning

DeepSeek V4 is DeepSeek's frontier reasoning model and the architectural successor to DeepSeek V3. The model is a Mixture-of-Experts at trillion-parameter total scale with a smaller activated parameter count per token (~37-50B active). Released under DeepSeek's permissive open-weight license, V4 represents the current state of open-weight frontier reasoning capability — in the same conversation as Claude 3.7 Sonnet and GPT-5 mini on math, code, and reasoning per DeepSeek's release benchmarks, and multi-step reasoning benchmarks.

Strengths

Frontier reasoning capability with permissive license. Open-weight + apache-style usage rights make V4 a genuine commercial-deployment alternative to closed-source frontier APIs.
MoE efficiency. Active parameter count is ~10× smaller than total — inference cost economics are dramatically better than dense 405B-class models at similar capability.
Long context handling. 128K context with stable degradation curve, similar to V3.
Math + code performance. Strong on AIME-level math, competitive programming, and multi-step code reasoning. The frontier-tier reasoning capability landed for open-weight in 2026.
Multilingual coverage. Continues V3's tradition of strong Chinese + English coverage with reasonable support for major Western European languages.

Limitations

Compute requirements are massive. V4 at FP16 requires multi-card cluster — single-card deployment requires aggressive quantization (Q3-Q4) and even then needs frontier hardware (MI300X, Mac Studio M3 Ultra, or 2× H100 PCIe NVL).
MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing optimizations. Not all serving stacks handle MoE efficiently.
Latency on single-token generation can be uneven as different tokens activate different experts.
Tool-use is mature but not class-leading. Anthropic Claude and OpenAI's flagships have more mature tool-use chain-of-thought than DeepSeek V4 in mid-2026.
Censorship and refusal behavior follows DeepSeek's RLHF policy — different from Western frontier models on certain political/social topics.

Real-world performance

vs DeepSeek V3: V4 is the strict reasoning-capability upgrade with similar serving costs (MoE total params similar tier).
vs Claude 3.7 Sonnet (API): Closer than V3 was to Claude. V4 wins on cost (open-weight self-hosted) for sustained workloads; Claude API wins on tool-use polish + reasoning trace quality.
vs Qwen 3 235B-A22B: V4 stronger on reasoning; Qwen 3 stronger on multilingual coverage and competitive programming optimization.
vs Llama 3.3 70B: V4 dramatically more capable but also dramatically more expensive to serve. Pick by capability tier needed.

Should you run this locally?

Yes if you operate frontier-tier infrastructure (MI300X cluster / H100 SXM cluster / 8× DGX) and need open-weight frontier reasoning + commercial licensing. The cost economics of self-hosted V4 vs Claude/GPT-5 API can favor self-hosting at scale.

No if you don't have frontier hardware (rent on cloud / use API), your workload is closer to Llama 3.3 70B capability tier (cheaper to serve), or you need maximum tool-use polish (frontier closed-source APIs win).

How it compares

vs DeepSeek V3: V4 is reasoning-capability upgrade with similar MoE structure.
vs DeepSeek V3 Lite: V3 Lite is the smaller-MoE variant for cheaper serving.
vs Qwen 3 235B-A22B: Different MoE design. V4 wins reasoning; Qwen 3 wins multilingual.
vs Anthropic Claude 3.7 Sonnet: V4 wins on cost at scale; Claude wins on tool-use polish.

Run this yourself

Single-card workstation: Mac Studio M3 Ultra (192 GB) at Q3 — slowest but functional path.
Single-card AMD: MI300X (192 GB) at Q3-Q4.
Datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
Frontier: 8× B200 SXM cluster for production multi-tenant serving.
Cloud rental: Runpod / Lambda B200 cluster ~$30-60/hr per node, or H100 SXM cluster ~$25-40/hr.

Quantization	File size	VRAM required
AWQ-INT4	425.0 GB	480 GB

Quantization

File size

VRAM required

AWQ-INT4

425.0 GB

480 GB

Frequently asked

What's the minimum VRAM to run DeepSeek V4?

480GB of VRAM is enough to run DeepSeek V4 at the AWQ-INT4 quantization (file size 425.0 GB). Higher-quality quantizations need more.

Can I use DeepSeek V4 commercially?

Yes — DeepSeek V4 ships under the DeepSeek License, which permits commercial use. Always read the license text before deployment.

What's the context length of DeepSeek V4?

DeepSeek V4 supports a context window of 131,072 tokens (about 131K).

Our verdict

Positioning

Strengths

Limitations

Real-world performance

Should you run this locally?

How it compares

Run this yourself

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run DeepSeek V4?

Can I use DeepSeek V4 commercially?

What's the context length of DeepSeek V4?

Related — keep moving