RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Frontier
  4. /Models
Frontier zone · Model releases

The frontier of open-weight model releases

Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.

By Fredoline Eruo · Refreshed continuously from catalog seed
Filter
Family
AnyQwenLlamaDeepSeekMistralGemmaPhiGLMOLMo
Deployment
AnyEdgeConsumerWorkstationDatacenterFrontier
Modality
AnyMultimodalText-only
Coverage
AnyL1.25 enrichedNeeds L1.25Needs benchmark

Filtered results (48)

Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.

Qwen 3.5 235B-A17B (MoE)

Alibaba · 2026-05-01
397B/17B-Afrontier

frontier-tier reasoning + multilingual serving on multi-machine clusters

L1.25 enrichedVerdict

DeepSeek V4 Pro (1.6T MoE)

DeepSeek · 2026-04-24
1600B/49B-Afrontier

frontier-tier coding + reasoning serving — currently the open-weight ceiling

L1.25 enrichedVerdict

Llama 4 Scout

Meta · 2026-04-05
109Bdatacenter

production multimodal serving — image + text at workstation-cluster scale

L1.25 enrichedVerdict

Qwen 3 32B

Alibaba · 2025-04-29
32Bworkstation

general-purpose reasoning + chat with toggle-style reasoning emission

L1.25 enrichedVerdict

Qwen 3 14B

Alibaba · 2025-04-29
14Bconsumer

16GB-VRAM reasoning workloads with thinking-mode toggle

L1.25 enrichedBenchmarkVerdict

Mistral Small 3 24B

Mistral AI · 2025-01-30
24Bconsumer

consumer-tier multilingual instruction-following — Mistral's instruction-tuned baseline at 24B

L1.25 enrichedVerdict

DeepSeek R1 (671B reasoning)

DeepSeek · 2025-01-20
671Bfrontier

frontier-tier reasoning research; cluster-only deployment

L1.25 enrichedVerdict

DeepSeek R1 Distill Qwen 32B

DeepSeek · 2025-01-20
32Bworkstation

single-machine reasoning — the canonical local R1 deployment

L1.25 enrichedVerdict

Phi-4 14B

Microsoft · 2024-12-12
14Bconsumer

16 GB VRAM tier reasoning + chat — the right pick when 32B-class doesn't fit

L1.25 enrichedVerdict

Llama 3.3 70B Instruct

Meta · 2024-12-06
70Bdatacenter

production self-hosted serving at the 70B class — when you need general-purpose capability above 32B but don't need frontier-tier

L1.25 enrichedVerdict

Qwen 2.5 Coder 32B Instruct

Alibaba · 2024-11-12
32Bworkstation

single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware

L1.25 enrichedVerdict

Orpheus 3B 0.1 FT

Canopy Labs
3B

Expressive, emotion-rich English TTS for agents, NPCs, and audiobooks on a consumer GPU

L1.25 enrichedVerdict

OpenELM 3B Instruct

Apple
3B

Academic study of layer-wise scaled transformer architectures

L1.25 enrichedVerdict

Falcon 3 3B Instruct

TII
3B

Multilingual European chat where Falcon license is acceptable

L1.25 enrichedVerdict

ColPali v1.3

ColPali team (Illuin Technology)
3B

Visual-document retrieval for multi-page PDFs with charts, tables, and scans where OCR pipelines fail

L1.25 enrichedVerdict

SDXL Turbo

Stability AI
2.6B

Real-time interactive text-to-image (~50-100ms/frame) on a consumer GPU for research and demos

L1.25 enrichedVerdict

Stable Diffusion 3.5 Medium

Stability AI
2.5B

Permissively-licensed text-to-image for small-business and indie commercial products on a 12-16GB consumer GPU

L1.25 enrichedVerdict

Kumru 2B

VNGRS
2.4Bconsumer

fast Turkish edge chat

L1.25 enrichedBenchmark

EXAONE 3.5 2.4B Instruct

LG AI Research
2.4B

Korean/English bilingual research prototyping on edge hardware

L1.25 enrichedVerdict

Salamandra 2B

BSC-LT
2.25B

Fine-tuning base for Spanish or Catalan/Galician/Basque NLP tasks

L1.25 enrichedVerdict

SmolVLM Instruct

Hugging Face
2.25B

Lowest-VRAM open VLM for image captioning on consumer GPU

L1.25 enrichedVerdict

Qwen 3.5 2B Turkish SFT

Tuguberk
2B
L1.25 enriched

Granite 3.1 2B Instruct

IBM
2B

Enterprise RAG and tool-use with vendor indemnification

L1.25 enrichedVerdict

Qwen2-VL 2B Instruct

Alibaba
2B

Lightweight document and chart understanding on a consumer GPU

L1.25 enrichedVerdict

Gemma 2 2B Instruct

Google
2B

Consumer-GPU local chat with strong safety defaults

L1.25 enrichedVerdict

Salamandra 2B Instruct

BSC
2B

Spanish and European multilingual instruction following on low-VRAM hardware

L1.25 enrichedVerdict

Kanarya 2B

asafaya
2B
L1.25 enriched

Qwen 3 1.7B

Alibaba
1.7B

Edge laptop assistant with reasoning that fits in 2GB VRAM

L1.25 enrichedVerdict

mxbai-rerank-large-v2

Mixedbread AI
1.54B

High-accuracy reranking for English+multilingual RAG when GPU budget allows a 1.5B decoder pass

L1.25 enrichedVerdict

mGPT 1.3B Uzbek

ai-forever
1.3B

Uzbek-language text generation and corpus experimentation

L1.25 enrichedVerdict

mGPT 1.3B Mongol

ai-forever
1.3B

Mongolian-language text generation and basic NLP prototyping

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v1.0

TinyLlama
1.1B

Reproducible SLM research baseline and legacy llama.cpp deployments

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v0.3 AWQ

Zhang Peiyuan
1.1B

Low-resource English chatbot prototyping

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v0.3 GPTQ

TheBloke
1.1B

Lightweight English chatbot on severely VRAM-constrained hardware

L1.25 enrichedVerdict

OLMo 2 1B Instruct

AllenAI
1B

Research baseline where full training reproducibility is required

L1.25 enrichedVerdict

Florence-2 Large

Microsoft
0.77B

Edge-tier unified caption / OCR / detection / grounding pipeline where you want one model instead of four

L1.25 enrichedVerdict

Distil-Whisper Large v3

Hugging Face / Distil-Whisper
0.756B

High-throughput English transcription pipelines (podcasts, call center, batch ASR) on a single consumer GPU

L1.25 enrichedVerdict

Kanarya 750M

asafaya
0.75B
L1.25 enriched

Turkish GPT-2 Large

ytu-ce-cosmos
0.7B
L1.25 enriched

Parakeet TDT 0.6B v2

NVIDIA
0.6B

Best-in-class English transcription throughput on NVIDIA GPUs with long-form support

L1.25 enrichedVerdict

Qwen3 0.6B Hindi Instruct v1 GGUF

pankajpandey-dev
0.6B

Simple Hindi instruction following on CPU-only devices

L1.25 enrichedVerdict

Qwen 3 0.6B

Alibaba
0.6B

Sub-1B on-device chat and tool-calling agent on phones

L1.25 enrichedVerdict

GOT-OCR 2.0

StepFun AI
0.58B

Self-hosted OCR for printed formulas, tables, and dense scientific PDFs to LaTeX/Markdown

L1.25 enrichedVerdict

Jina Embeddings v3

Jina AI
0.572B

Multilingual RAG with task-switched LoRA adapters — research and non-commercial deployments only

L1.25 enrichedVerdict

Snowflake Arctic Embed L v2.0

Snowflake
0.568B

Commercial multilingual RAG where Apache-2.0 license is required and jina-v3's CC-BY-NC is a blocker

L1.25 enrichedVerdict

Multilingual E5 Large Instruct

Microsoft (intfloat)
0.56B

Short-passage multilingual RAG with MIT license requirement and chunking pipeline already in place

L1.25 enrichedVerdict

Vikhr Qwen 2.5 0.5B Instruct

Vikhrmodels
0.5B

Russian-language mobile chatbot or on-device assistant

L1.25 enrichedVerdict

XTTS v2

Coqui
0.46B

Multilingual voice cloning from a short reference clip for personal or research use

L1.25 enrichedVerdict

Going deeper

  • Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
  • Execution stacks — recipes that combine models with runtimes + hardware.
  • Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
  • Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.