The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Recent releases (12 newest)
Catalog entries with the most recent release dates. Use the authority badges to spot which have full editorial coverage (L1.25 enriched + benchmark) and which are catalog-only.
Ring-2.6-1T
frontier reasoning at MoE serving cost
Qwen 3.6 35B-A3B (MTP)
high-throughput MoE inference at workstation tier
Qwen 3.6 27B (MTP)
dense workstation model with throughput-acceleration
Qwen 3.5 235B-A17B (MoE)
frontier-tier reasoning + multilingual serving on multi-machine clusters
Mistral Medium 3.5 (675B MoE)
frontier MoE — Mistral's response to the open MoE wave
Mistral Medium 3 24B (dense)
research / non-commercial workstation deployments
DeepSeek V4 Pro (1.6T MoE)
frontier-tier coding + reasoning serving — currently the open-weight ceiling
DeepSeek V4 Flash (284B MoE)
datacenter MoE — V4 efficiency variant
OLMo 2 32B
fully-open AI2 OLMo 2 — research provenance flagship
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Llama 4 Maverick
frontier-tier multimodal serving on multi-machine clusters
Llama 4 Scout
production multimodal serving — image + text at workstation-cluster scale
New reasoning models
Models with explicit thinking-block emission — DeepSeek R1 family, QwQ, Kimi, Magistral, Qwen 3 reasoning-mode. /stacks/local-reasoning-model for the canonical deployment recipe.
Kimi K2.6
Moonshot frontier MoE — long-context specialist
Magistral 32B
research / non-commercial reasoning at 32B scale
Kimi K1.5
deep math + reasoning research
Qwen 3 Coder 32B
coding-specialized agent workloads
DeepSeek R1 Distill Qwen 3 32B
workstation reasoning with Qwen 3 base improvements
Qwen 3 235B-A22B
Qwen 3 MoE flagship — pre-3.5 baseline
New coding models
Coding-specialized fine-tunes. The Qwen Coder lineage is the current open-weight benchmark leader; DeepSeek Coder V3, Codestral, Devstral, OpenCoder are the credible alternatives. /stacks/local-coding-agent for the canonical deployment recipe.
DeepSeek Coder V3
workstation coding alternative to Qwen 2.5 Coder
Devstral Small 2 24B
Apache 2.0 coding alternative to Qwen 2.5 Coder
Yi Coder 9B
8GB-VRAM coding
Qwen 2.5 Coder 32B Instruct
single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware
Qwen 2.5 Coder 14B Instruct
16GB-VRAM coding
Qwen 2.5 Coder 7B Instruct
consumer-tier coding at 8GB VRAM
New multimodal models
Vision-language models. The 2025-2026 wave: Llama 4 Scout / Maverick, Qwen 2.5-VL, Pixtral, Janus-Pro, Phi-4 Multimodal. /stacks/local-vision-model for the canonical deployment recipe.
Llama 4 Maverick
frontier-tier multimodal serving on multi-machine clusters
Gemma 4 31B Dense
workstation-tier multilingual chat with permissive license
Gemma 4 26B MoE
Gemma 4 MoE — workstation efficiency variant
Gemma 4 E4B (Effective 4B)
edge-tier Gemma 4 — laptop friendly
Gemma 4 E2B (Effective 2B)
phone-tier Gemma 4
Phi-4 Multimodal
16GB-consumer multimodal Q&A
New MoE models
Mixture-of-Experts releases. Active-parameter efficiency shapes the deployment economics. See /systems/distributed-inference for the architectural depth.
Ring-2.6-1T
frontier reasoning at MoE serving cost
Qwen 3.6 35B-A3B (MTP)
high-throughput MoE inference at workstation tier
Qwen 3.5 235B-A17B (MoE)
frontier-tier reasoning + multilingual serving on multi-machine clusters
Mistral Medium 3.5 (675B MoE)
frontier MoE — Mistral's response to the open MoE wave
DeepSeek V4 Pro (1.6T MoE)
frontier-tier coding + reasoning serving — currently the open-weight ceiling
DeepSeek V4 Flash (284B MoE)
datacenter MoE — V4 efficiency variant
New edge / phone-tier models
Sub-4B models for phone / Pi / embedded deployment. Phi-4 Mini, Gemma 3 1B, MiniCPM 3 4B, SmolLM 3, Hermes 3 3B, Dolphin 3 3B, RWKV 7 Goose 1.5B.
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Gemma 4 E4B (Effective 4B)
edge-tier Gemma 4 — laptop friendly
Gemma 4 E2B (Effective 2B)
phone-tier Gemma 4
Phi-4 Mini 4B
edge / embedded reasoning
SmolLM 3 3B
edge-tier reasoning
Qwen 3 4B
edge-tier Qwen 3 — Apple Silicon laptop friendly
Enrichment gaps — OPERATOR queue
High-relevance catalog entries (7B-100B) that lack L1.25 enrichment, verdict, AND benchmark. These render noindex today — the next sprint's editorial queue. Surfacing them here keeps the gap visible.
Turkish Gemma 9B T1
Turkish Llama 8B Instruct v0.1
Trendyol LLM 7B Chat v0.1
Omni 31B Turkish Reasoning
Cosmos Llama 3 8B Turkish
Gemma 4 Turkish 26B (4B active)
Turkish Mistral 7B Instruct v0.2
Mihenk LLM v2 35B (Turkish Financial)
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.