The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (48)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
Ring-2.6-1T
frontier reasoning at MoE serving cost
Qwen 3.6 35B-A3B (MTP)
high-throughput MoE inference at workstation tier
Qwen 3.6 27B (MTP)
dense workstation model with throughput-acceleration
Qwen 3.5 235B-A17B (MoE)
frontier-tier reasoning + multilingual serving on multi-machine clusters
Mistral Medium 3.5 (675B MoE)
frontier MoE — Mistral's response to the open MoE wave
Mistral Medium 3 24B (dense)
research / non-commercial workstation deployments
DeepSeek V4 Pro (1.6T MoE)
frontier-tier coding + reasoning serving — currently the open-weight ceiling
DeepSeek V4 Flash (284B MoE)
datacenter MoE — V4 efficiency variant
OLMo 2 32B
fully-open AI2 OLMo 2 — research provenance flagship
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Llama 4 Scout
production multimodal serving — image + text at workstation-cluster scale
DeepSeek V4
frontier-tier reasoning on multi-machine clusters
Granite 3.3 8B
enterprise tool-calling on IBM stacks
Kimi K2.6
Moonshot frontier MoE — long-context specialist
Mistral Small 3.2 24B
consumer-tier multilingual instruction-following
Phi-4 Mini 4B
edge / embedded reasoning
GLM-5 Pro
Chinese-language enterprise serving
Nemotron 3 Super (120B-A12B)
NVIDIA-tuned datacenter-tier reasoning
Llama 4 405B
frontier-tier serving on cluster hardware
Llama 4 70B
production self-hosted serving on 2x A100 / H100
DeepSeek Coder V3
workstation coding alternative to Qwen 2.5 Coder
GLM-5
Zhipu GLM-5 frontier MoE
Nemotron 3 Super 49B
32GB-VRAM enterprise deployments
Nemotron 3 Nano 9B
NVIDIA-stack tool-calling agents
Nemotron 3 Nano (30B-A3B)
NVIDIA-tuned consumer-tier general
DeepSeek V3 Lite (16B MoE)
consumer-tier MoE inference
Hermes 4 Llama 3.3 70B
datacenter-tier instruction-tuned alternative to base Llama 3.3
Magistral 32B
research / non-commercial reasoning at 32B scale
Kimi K1.5
deep math + reasoning research
Qwen 3 Coder 32B
coding-specialized agent workloads
DeepSeek R1 Distill Qwen 3 32B
workstation reasoning with Qwen 3 base improvements
EXAONE 3.5 32B
Korean / Japanese / CJK workloads
EXAONE 3.5 8B
consumer-tier Korean workloads
SmolLM 3 3B
edge-tier reasoning
InternLM 3 8B
Chinese-language consumer workloads
Step-3
frontier-research workloads
Dolphin 3 Llama 3.3 70B
datacenter creative / less-restricted generation
Devstral Small 2 24B
Apache 2.0 coding alternative to Qwen 2.5 Coder
Yi Coder 9B
8GB-VRAM coding
Qwen 3 7B
consumer-tier reasoning on 8GB+ GPUs
EVA Llama 3.3 70B
datacenter-tier creative / narrative generation
Qwen 3 Embedding 8B
permissively-licensed embeddings at 8B
Phi-4 Reasoning 14B
consumer-tier reasoning via Phi-4 lineage
Qwen 3 235B-A22B
Qwen 3 MoE flagship — pre-3.5 baseline
Qwen 3 32B
general-purpose reasoning + chat with toggle-style reasoning emission
Qwen 3 30B-A3B
workstation MoE — 3B active, 30B total
Qwen 3 14B
16GB-VRAM reasoning workloads with thinking-mode toggle
Qwen 3 8B
consumer-tier reasoning toggle
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.