The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (32)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
DeepSeek V4 Flash (284B MoE)
datacenter MoE — V4 efficiency variant
Llama 4 Scout
production multimodal serving — image + text at workstation-cluster scale
GLM-5 Pro
Chinese-language enterprise serving
Nemotron 3 Super (120B-A12B)
NVIDIA-tuned datacenter-tier reasoning
Llama 4 70B
production self-hosted serving on 2x A100 / H100
Hermes 4 Llama 3.3 70B
datacenter-tier instruction-tuned alternative to base Llama 3.3
Kimi K1.5
deep math + reasoning research
Dolphin 3 Llama 3.3 70B
datacenter creative / less-restricted generation
EVA Llama 3.3 70B
datacenter-tier creative / narrative generation
Qwen 2.5-VL 72B
frontier-tier multimodal serving
DeepSeek R1 Distill Llama 70B
datacenter-tier reasoning
Llama 3.3 70B Instruct
production self-hosted serving at the 70B class — when you need general-purpose capability above 32B but don't need frontier-tier
InternVL 2.5 78B
datacenter-tier permissive VLM
Tulu 3 70B
datacenter-tier open-recipe instruct
Llama 3.1 Nemotron 70B Instruct
NVIDIA-fine-tuned Llama 3.1 70B
Llama 3.2 90B Vision
datacenter-tier multimodal serving
Llama 3.2 90B Vision Instruct
datacenter vision-language Llama at 70B-class
Molmo 72B
datacenter-tier open VLM for agent UI
Qwen 2.5 Math 72B
datacenter-tier math specialist
Qwen 2.5 72B Instruct
production multilingual at 70B-class
DeepSeek V2.5 236B
DeepSeek lineage reference — pre-V3
Command R+ (Aug 2024)
research / non-commercial RAG workflows
Command R+ 104B
datacenter RAG-tuned at 100B class
Hermes 3 Llama 3.1 70B
datacenter-tier Hermes — instruction following
Mistral Large 2 (123B)
datacenter dense Mistral flagship — pre-Medium-3.5
Llama 3.1 70B Instruct
production self-hosted serving at the 70B class
DeepSeek Coder V2 236B
datacenter-tier MoE coding
OpenBioLLM Llama 3 70B
medical / clinical NLP
Mixtral 8x22B Instruct
datacenter MoE — 39B active, 141B total
WizardLM-2 8x22B
Mixtral 8x22B fine-tune — reasoning-tuned
DBRX Instruct
Databricks-native enterprise inference
DBRX Base
MoE fine-tuning base
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.