The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (48)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
Qwen 3.6 35B-A3B (MTP)
high-throughput MoE inference at workstation tier
Qwen 3.6 27B (MTP)
dense workstation model with throughput-acceleration
Mistral Medium 3 24B (dense)
research / non-commercial workstation deployments
OLMo 2 32B
fully-open AI2 OLMo 2 — research provenance flagship
Granite 3.3 8B
enterprise tool-calling on IBM stacks
Mistral Small 3.2 24B
consumer-tier multilingual instruction-following
Llama 4 70B
production self-hosted serving on 2x A100 / H100
DeepSeek Coder V3
workstation coding alternative to Qwen 2.5 Coder
Nemotron 3 Super 49B
32GB-VRAM enterprise deployments
Nemotron 3 Nano 9B
NVIDIA-stack tool-calling agents
Nemotron 3 Nano (30B-A3B)
NVIDIA-tuned consumer-tier general
DeepSeek V3 Lite (16B MoE)
consumer-tier MoE inference
Hermes 4 Llama 3.3 70B
datacenter-tier instruction-tuned alternative to base Llama 3.3
Magistral 32B
research / non-commercial reasoning at 32B scale
Qwen 3 Coder 32B
coding-specialized agent workloads
DeepSeek R1 Distill Qwen 3 32B
workstation reasoning with Qwen 3 base improvements
EXAONE 3.5 32B
Korean / Japanese / CJK workloads
EXAONE 3.5 8B
consumer-tier Korean workloads
InternLM 3 8B
Chinese-language consumer workloads
Dolphin 3 Llama 3.3 70B
datacenter creative / less-restricted generation
Devstral Small 2 24B
Apache 2.0 coding alternative to Qwen 2.5 Coder
Yi Coder 9B
8GB-VRAM coding
Qwen 3 7B
consumer-tier reasoning on 8GB+ GPUs
EVA Llama 3.3 70B
datacenter-tier creative / narrative generation
Qwen 3 Embedding 8B
permissively-licensed embeddings at 8B
Phi-4 Reasoning 14B
consumer-tier reasoning via Phi-4 lineage
Qwen 3 30B-A3B
workstation MoE — 3B active, 30B total
Qwen 3 8B
consumer-tier reasoning toggle
Granite 3 MoE (3B active)
consumer-tier enterprise MoE
Llama 3.3 8B Instruct
consumer-tier chat — drop-in 3.1 8B replacement
Llama 3.1 Nemotron Nano 8B
consumer-tier Nemotron-Llama
DeepSeek R1 Distill Mistral 24B
consumer-tier reasoning with Mistral instruction lineage
Granite 3.2 8B
enterprise tool-calling on IBM stacks
Mistral Saba 24B
Arabic / South-Asian multilingual
Dolphin 3.0 Mistral 24B
consumer-tier creative / less-restricted generation
DeepSeek R1 Distill Llama 70B
datacenter-tier reasoning
DeepSeek R1 Distill Qwen 14B
consumer-tier reasoning at 14B
DeepSeek R1 Distill Llama 8B
consumer-tier reasoning on 8GB+ GPUs
DeepSeek R1 Distill Qwen 7B
consumer-tier reasoning at 7B
Falcon 3 10B
Arabic-language workloads
Falcon 3 7B Instruct
consumer-tier multilingual
QwQ 32B Preview
workstation-tier reasoning — Qwen team alternative to R1
OLMo 2 13B
reproducibility / academic research
Tulu 3 70B
datacenter-tier open-recipe instruct
Tulu 3 8B
fully-open instruction-following research baseline
Qwen 2.5 Coder 14B Instruct
16GB-VRAM coding
Qwen 2.5 Coder 7B Instruct
consumer-tier coding at 8GB VRAM
OpenCoder 8B
academic / reproducibility-sensitive coding research
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.