DeepSeek V2 Lite Chat
DeepSeek-V2-Lite-Chat is a 15.7B-total, 2.4B-active mixture-of-experts chat model using DeepSeek's Multi-head Latent Attention (MLA) architecture for KV-cache efficiency. Native context is 32K with YaRN scaling published up to 160K, and it ships under the DeepSeek model license which permits commercial use.
Misclassified as an SLM by parameter count, but the 2.4B active-parameter compute profile makes it a fair entry in this tier for inference latency on a workstation. Punches like a 7B dense model while activating only a third of the FLOPs.
Overview
DeepSeek-V2-Lite-Chat is a 15.7B-total, 2.4B-active mixture-of-experts chat model using DeepSeek's Multi-head Latent Attention (MLA) architecture for KV-cache efficiency. Native context is 32K with YaRN scaling published up to 160K, and it ships under the DeepSeek model license which permits commercial use.
Strengths
- MoE design delivers ~7B-class quality at 2.4B active compute cost
- MLA attention dramatically reduces KV-cache memory at long context
- DeepSeek license permits commercial use (with conditions)
- Native 32K context with documented YaRN extension to 160K
Weaknesses
- Total VRAM is ~30GB at fp16 — does NOT fit on consumer 12GB GPUs at full precision
- MoE routing means inference engines without expert support fall back to slow paths
- DeepSeek license has acceptable-use restrictions; not a true OSI license
- Custom-code repo requires trust_remote_code=True, which some pipelines block
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 8.6 GB | 11 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DeepSeek V2 Lite Chat.
Frequently asked
What's the minimum VRAM to run DeepSeek V2 Lite Chat?
Can I use DeepSeek V2 Lite Chat commercially?
What's the context length of DeepSeek V2 Lite Chat?
Source: huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify DeepSeek V2 Lite Chat runs on your specific hardware before committing money.