The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (27)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
Phi-4 Reasoning Mini 4B
edge-tier reasoning
Phi-4 Mini 4B
edge / embedded reasoning
SmolLM 3 3B
edge-tier reasoning
Qwen 3 4B
edge-tier Qwen 3 — Apple Silicon laptop friendly
Gemma 3 1B
phone-tier Gemma — smallest practical Gemma 3
RWKV 7 'Goose' 1.5B
long-context edge inference where memory matters more than quality
DeepSeek R1 Distill Qwen 1.5B
edge-tier reasoning
Dolphin 3.0 Llama 3.2 3B
creative / less-restricted generation at edge tier
EXAONE 3.5 2.4B
edge-tier Korean chat
Qwen 2.5 Coder 3B
Apple Silicon laptop coding autocomplete
Qwen 2.5 Coder 1.5B
IDE autocomplete on integrated GPUs
SmolLM 2 1.7B Instruct
edge-tier Apache 2.0 baseline
SmolLM 2 360M Instruct
phone / Pi-class chat
Granite 3.0 2B Instruct
edge-tier IBM Granite
Ministral 3B Instruct
edge-tier long-context — research only
Hermes 3 Llama 3.2 3B
edge-tier instruction following
Llama 3.2 3B Instruct
battery-powered laptop chat tier
Llama 3.2 1B Instruct
edge / phone-tier chat — smallest practical Llama
Qwen 2.5 3B Instruct
edge-tier Qwen 2.5 chat
Qwen 2.5 1.5B Instruct
edge-tier Apache 2.0 chat
Qwen 2.5 0.5B Instruct
phone-tier Qwen baseline
Nemotron Mini 4B Instruct
edge-tier role-play / chat
MiniCPM 3 4B
phone / embedded inference
Phi-3.5 Mini Instruct
edge-tier Phi
BGE Reranker v2 M3
RAG reranker
StarCoder 2 3B
edge-tier code completion
BGE M3
multilingual RAG embeddings
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.