The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (20)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
DeepSeek V4 Pro (1.6T MoE)
frontier-tier coding + reasoning serving — currently the open-weight ceiling
DeepSeek V4 Flash (284B MoE)
datacenter MoE — V4 efficiency variant
DeepSeek V4
frontier-tier reasoning on multi-machine clusters
DeepSeek Coder V3
workstation coding alternative to Qwen 2.5 Coder
DeepSeek V3 Lite (16B MoE)
consumer-tier MoE inference
DeepSeek R1 Distill Qwen 3 32B
workstation reasoning with Qwen 3 base improvements
DeepSeek R1 Distill Mistral 24B
consumer-tier reasoning with Mistral instruction lineage
DeepSeek R1 (671B reasoning)
frontier-tier reasoning research; cluster-only deployment
DeepSeek R1 Distill Llama 70B
datacenter-tier reasoning
DeepSeek R1 Distill Qwen 32B
single-machine reasoning — the canonical local R1 deployment
DeepSeek R1 Distill Qwen 14B
consumer-tier reasoning at 14B
DeepSeek R1 Distill Llama 8B
consumer-tier reasoning on 8GB+ GPUs
DeepSeek R1 Distill Qwen 7B
consumer-tier reasoning at 7B
DeepSeek R1 Distill Qwen 1.5B
edge-tier reasoning
DeepSeek V3 (671B MoE)
frontier-tier MoE serving — pre-V4 baseline
DeepSeek V2.5 236B
DeepSeek lineage reference — pre-V3
DeepSeek Coder V2 236B
datacenter-tier MoE coding
DeepSeek Coder V2 Lite (16B)
consumer-tier coding MoE
DeepSeek MoE 16B Base
research / lineage reference
DeepSeek V2 Lite Chat
Workstation chat where MoE active-param efficiency matters more than total VRAM
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.