The frontier of open-weight model releases
Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.
Filtered results (48)
Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.
Qwen 3.5 235B-A17B (MoE)
frontier-tier reasoning + multilingual serving on multi-machine clusters
DeepSeek V4 Pro (1.6T MoE)
frontier-tier coding + reasoning serving — currently the open-weight ceiling
Llama 4 Scout
production multimodal serving — image + text at workstation-cluster scale
Qwen 3 32B
general-purpose reasoning + chat with toggle-style reasoning emission
Qwen 3 14B
16GB-VRAM reasoning workloads with thinking-mode toggle
Mistral Small 3 24B
consumer-tier multilingual instruction-following — Mistral's instruction-tuned baseline at 24B
DeepSeek R1 (671B reasoning)
frontier-tier reasoning research; cluster-only deployment
DeepSeek R1 Distill Qwen 32B
single-machine reasoning — the canonical local R1 deployment
Phi-4 14B
16 GB VRAM tier reasoning + chat — the right pick when 32B-class doesn't fit
Llama 3.3 70B Instruct
production self-hosted serving at the 70B class — when you need general-purpose capability above 32B but don't need frontier-tier
Qwen 2.5 Coder 32B Instruct
single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware
Orpheus 3B 0.1 FT
Expressive, emotion-rich English TTS for agents, NPCs, and audiobooks on a consumer GPU
OpenELM 3B Instruct
Academic study of layer-wise scaled transformer architectures
Falcon 3 3B Instruct
Multilingual European chat where Falcon license is acceptable
ColPali v1.3
Visual-document retrieval for multi-page PDFs with charts, tables, and scans where OCR pipelines fail
SDXL Turbo
Real-time interactive text-to-image (~50-100ms/frame) on a consumer GPU for research and demos
Stable Diffusion 3.5 Medium
Permissively-licensed text-to-image for small-business and indie commercial products on a 12-16GB consumer GPU
Kumru 2B
fast Turkish edge chat
EXAONE 3.5 2.4B Instruct
Korean/English bilingual research prototyping on edge hardware
Salamandra 2B
Fine-tuning base for Spanish or Catalan/Galician/Basque NLP tasks
SmolVLM Instruct
Lowest-VRAM open VLM for image captioning on consumer GPU
Qwen 3.5 2B Turkish SFT
Granite 3.1 2B Instruct
Enterprise RAG and tool-use with vendor indemnification
Qwen2-VL 2B Instruct
Lightweight document and chart understanding on a consumer GPU
Gemma 2 2B Instruct
Consumer-GPU local chat with strong safety defaults
Salamandra 2B Instruct
Spanish and European multilingual instruction following on low-VRAM hardware
Kanarya 2B
Qwen 3 1.7B
Edge laptop assistant with reasoning that fits in 2GB VRAM
mxbai-rerank-large-v2
High-accuracy reranking for English+multilingual RAG when GPU budget allows a 1.5B decoder pass
mGPT 1.3B Uzbek
Uzbek-language text generation and corpus experimentation
mGPT 1.3B Mongol
Mongolian-language text generation and basic NLP prototyping
TinyLlama 1.1B Chat v1.0
Reproducible SLM research baseline and legacy llama.cpp deployments
TinyLlama 1.1B Chat v0.3 AWQ
Low-resource English chatbot prototyping
TinyLlama 1.1B Chat v0.3 GPTQ
Lightweight English chatbot on severely VRAM-constrained hardware
OLMo 2 1B Instruct
Research baseline where full training reproducibility is required
Florence-2 Large
Edge-tier unified caption / OCR / detection / grounding pipeline where you want one model instead of four
Distil-Whisper Large v3
High-throughput English transcription pipelines (podcasts, call center, batch ASR) on a single consumer GPU
Kanarya 750M
Turkish GPT-2 Large
Parakeet TDT 0.6B v2
Best-in-class English transcription throughput on NVIDIA GPUs with long-form support
Qwen3 0.6B Hindi Instruct v1 GGUF
Simple Hindi instruction following on CPU-only devices
Qwen 3 0.6B
Sub-1B on-device chat and tool-calling agent on phones
GOT-OCR 2.0
Self-hosted OCR for printed formulas, tables, and dense scientific PDFs to LaTeX/Markdown
Jina Embeddings v3
Multilingual RAG with task-switched LoRA adapters — research and non-commercial deployments only
Snowflake Arctic Embed L v2.0
Commercial multilingual RAG where Apache-2.0 license is required and jina-v3's CC-BY-NC is a blocker
Multilingual E5 Large Instruct
Short-passage multilingual RAG with MIT license requirement and chunking pipeline already in place
Vikhr Qwen 2.5 0.5B Instruct
Russian-language mobile chatbot or on-device assistant
XTTS v2
Multilingual voice cloning from a short reference clip for personal or research use
Going deeper
- Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
- Execution stacks — recipes that combine models with runtimes + hardware.
- Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
- Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.