The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (39)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
Qwen 3.6 35B-A3B (MTP)
high-throughput MoE inference at workstation tier
Qwen 3.6 27B (MTP)
dense workstation model with throughput-acceleration
Qwen 3.5 235B-A17B (MoE)
frontier-tier reasoning + multilingual serving on multi-machine clusters
Qwen 3 Coder 32B
coding-specialized agent workloads
Qwen 3 7B
consumer-tier reasoning on 8GB+ GPUs
Qwen 3 Embedding 8B
permissively-licensed embeddings at 8B
Qwen 3 235B-A22B
Qwen 3 MoE flagship — pre-3.5 baseline
Qwen 3 32B
general-purpose reasoning + chat with toggle-style reasoning emission
Qwen 3 30B-A3B
workstation MoE — 3B active, 30B total
Qwen 3 14B
16GB-VRAM reasoning workloads with thinking-mode toggle
Qwen 3 8B
consumer-tier reasoning toggle
Qwen 3 4B
edge-tier Qwen 3 — Apple Silicon laptop friendly
Qwen 2.5-VL 72B
frontier-tier multimodal serving
Qwen 2.5-VL 7B
consumer-tier OCR + image Q&A
Qwen 2.5-VL 3B
edge-tier multimodal
QwQ 32B Preview
workstation-tier reasoning — Qwen team alternative to R1
Qwen 2.5 Coder 32B Instruct
single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware
Qwen 2.5 Coder 14B Instruct
16GB-VRAM coding
Qwen 2.5 Coder 7B Instruct
consumer-tier coding at 8GB VRAM
Qwen 2.5 Coder 3B
Apple Silicon laptop coding autocomplete
Qwen 2.5 Coder 1.5B
IDE autocomplete on integrated GPUs
Qwen 2.5 Math 72B
datacenter-tier math specialist
Qwen 2.5 72B Instruct
production multilingual at 70B-class
Qwen 2.5 32B Instruct
workstation-tier multilingual general chat
Qwen 2.5 14B Instruct
16GB-VRAM general chat with multilingual depth
Qwen 2.5 7B Instruct
consumer-tier multilingual chat
Qwen 2.5 Math 7B
consumer-tier math problem solving
Qwen 2.5 3B Instruct
edge-tier Qwen 2.5 chat
Qwen 2.5 1.5B Instruct
edge-tier Apache 2.0 chat
Qwen 2.5 0.5B Instruct
phone-tier Qwen baseline
Qwen 2-VL 7B
consumer-tier multimodal — pre-2.5-VL baseline
CodeQwen 1.5 7B
historical reference — Qwen 2.5 Coder 7B is the modern pick
Qwen3 Swallow 32B RL v0.2
Japanese-English reasoning tasks: math, coding, structured analysis
Qwen3.5 9B Thai Law Base
Foundation for fine-tuning Thai legal NLP tools
Qwen2-VL 2B Instruct
Lightweight document and chart understanding on a consumer GPU
Qwen 3.5 2B Turkish SFT
Qwen 3 1.7B
Edge laptop assistant with reasoning that fits in 2GB VRAM
Qwen3 0.6B Hindi Instruct v1 GGUF
Simple Hindi instruction following on CPU-only devices
Qwen 3 0.6B
Sub-1B on-device chat and tool-calling agent on phones
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.