->Will it run?Best GPU Compare Troubleshoot Start Learn Pulse Models Hardware Tools Bench

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo

DIR

Models
Hardware
Tools
Benchmarks

TOOLS

Will it run?
Compare hardware
Cost vs cloud
Choose my GPU
Prompting kits
Quick answers

REF

All buyer guides
Learn local AI
Methodology
Glossary
Errors KB
Trust

EDITOR

About
Author
How we make money
Editorial policy
Contact

LEGAL

Privacy
Terms
Sitemap

MAIL · MONTHLY DIGEST

Get monthly local AI changes

Monthly recap. No spam.

Email address

DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated

RUNLOCALAI · v38

>
Home
Frontier
Models

Frontier zone · Model releases

The frontier of open-weight model releases

Open-weight model releases tracked by RunLocalAI — recent additions, rising families, distill chains, multimodal and reasoning waves. Each card links into the catalog with authority badges (L1.25 enriched · benchmark-backed · verdict) so you can scan editorial coverage at a glance.

By Fredoline Eruo · Refreshed continuously from catalog seed

Filter

Family

Any Qwen Llama DeepSeek Mistral Gemma Phi GLM OLMo

Deployment

Any Edge Consumer Workstation Datacenter Frontier

Modality

Any Multimodal Text-only

Coverage

Any L1.25 enriched Needs L1.25 Needs benchmark

Filtered results (48)

Models matching your filters. Clear filters by clicking “Any” on each row above, or remove individual filters via the URL.

Qwen 3.5 235B-A17B (MoE)

Alibaba · 2026-05-01

397B/17B-Afrontier

frontier-tier reasoning + multilingual serving on multi-machine clusters

L1.25 enrichedVerdict

DeepSeek V4 Pro (1.6T MoE)

DeepSeek · 2026-04-24

1600B/49B-Afrontier

frontier-tier coding + reasoning serving — currently the open-weight ceiling

L1.25 enrichedVerdict

Llama 4 Scout

Meta · 2026-04-05

production multimodal serving — image + text at workstation-cluster scale

L1.25 enrichedVerdict

Qwen 3 32B

Alibaba · 2025-04-29

general-purpose reasoning + chat with toggle-style reasoning emission

L1.25 enrichedVerdict

Qwen 3 14B

Alibaba · 2025-04-29

16GB-VRAM reasoning workloads with thinking-mode toggle

L1.25 enrichedBenchmarkVerdict

Mistral Small 3 24B

Mistral AI · 2025-01-30

consumer-tier multilingual instruction-following — Mistral's instruction-tuned baseline at 24B

L1.25 enrichedVerdict

DeepSeek R1 (671B reasoning)

DeepSeek · 2025-01-20

frontier-tier reasoning research; cluster-only deployment

L1.25 enrichedVerdict

DeepSeek R1 Distill Qwen 32B

DeepSeek · 2025-01-20

single-machine reasoning — the canonical local R1 deployment

L1.25 enrichedVerdict

Phi-4 14B

Microsoft · 2024-12-12

16 GB VRAM tier reasoning + chat — the right pick when 32B-class doesn't fit

L1.25 enrichedVerdict

Llama 3.3 70B Instruct

Meta · 2024-12-06

production self-hosted serving at the 70B class — when you need general-purpose capability above 32B but don't need frontier-tier

L1.25 enrichedVerdict

Qwen 2.5 Coder 32B Instruct

Alibaba · 2024-11-12

single-user autonomous coding agents on RTX 4090 / 5090 / dual-A100 hardware

L1.25 enrichedVerdict

Orpheus 3B 0.1 FT

Expressive, emotion-rich English TTS for agents, NPCs, and audiobooks on a consumer GPU

L1.25 enrichedVerdict

OpenELM 3B Instruct

Academic study of layer-wise scaled transformer architectures

L1.25 enrichedVerdict

Falcon 3 3B Instruct

Multilingual European chat where Falcon license is acceptable

L1.25 enrichedVerdict

ColPali v1.3

ColPali team (Illuin Technology)

Visual-document retrieval for multi-page PDFs with charts, tables, and scans where OCR pipelines fail

L1.25 enrichedVerdict

SDXL Turbo

Real-time interactive text-to-image (~50-100ms/frame) on a consumer GPU for research and demos

L1.25 enrichedVerdict

Stable Diffusion 3.5 Medium

Permissively-licensed text-to-image for small-business and indie commercial products on a 12-16GB consumer GPU

L1.25 enrichedVerdict

Kumru 2B

fast Turkish edge chat

L1.25 enrichedBenchmark

EXAONE 3.5 2.4B Instruct

Korean/English bilingual research prototyping on edge hardware

L1.25 enrichedVerdict

Salamandra 2B

Fine-tuning base for Spanish or Catalan/Galician/Basque NLP tasks

L1.25 enrichedVerdict

SmolVLM Instruct

Lowest-VRAM open VLM for image captioning on consumer GPU

L1.25 enrichedVerdict

Qwen 3.5 2B Turkish SFT

Granite 3.1 2B Instruct

Enterprise RAG and tool-use with vendor indemnification

L1.25 enrichedVerdict

Qwen2-VL 2B Instruct

Lightweight document and chart understanding on a consumer GPU

L1.25 enrichedVerdict

Gemma 2 2B Instruct

Consumer-GPU local chat with strong safety defaults

L1.25 enrichedVerdict

Salamandra 2B Instruct

Spanish and European multilingual instruction following on low-VRAM hardware

L1.25 enrichedVerdict

Kanarya 2B

Qwen 3 1.7B

Edge laptop assistant with reasoning that fits in 2GB VRAM

L1.25 enrichedVerdict

mxbai-rerank-large-v2

High-accuracy reranking for English+multilingual RAG when GPU budget allows a 1.5B decoder pass

L1.25 enrichedVerdict

mGPT 1.3B Uzbek

Uzbek-language text generation and corpus experimentation

L1.25 enrichedVerdict

mGPT 1.3B Mongol

Mongolian-language text generation and basic NLP prototyping

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v1.0

Reproducible SLM research baseline and legacy llama.cpp deployments

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v0.3 AWQ

Low-resource English chatbot prototyping

L1.25 enrichedVerdict

TinyLlama 1.1B Chat v0.3 GPTQ

Lightweight English chatbot on severely VRAM-constrained hardware

L1.25 enrichedVerdict

OLMo 2 1B Instruct

Research baseline where full training reproducibility is required

L1.25 enrichedVerdict

Florence-2 Large

Edge-tier unified caption / OCR / detection / grounding pipeline where you want one model instead of four

L1.25 enrichedVerdict

Distil-Whisper Large v3

Hugging Face / Distil-Whisper

High-throughput English transcription pipelines (podcasts, call center, batch ASR) on a single consumer GPU

L1.25 enrichedVerdict

Kanarya 750M

Turkish GPT-2 Large

Parakeet TDT 0.6B v2

Best-in-class English transcription throughput on NVIDIA GPUs with long-form support

L1.25 enrichedVerdict

Qwen3 0.6B Hindi Instruct v1 GGUF

pankajpandey-dev

Simple Hindi instruction following on CPU-only devices

L1.25 enrichedVerdict

Qwen 3 0.6B

Sub-1B on-device chat and tool-calling agent on phones

L1.25 enrichedVerdict

GOT-OCR 2.0

Self-hosted OCR for printed formulas, tables, and dense scientific PDFs to LaTeX/Markdown

L1.25 enrichedVerdict

Jina Embeddings v3

Multilingual RAG with task-switched LoRA adapters — research and non-commercial deployments only

L1.25 enrichedVerdict

Snowflake Arctic Embed L v2.0

Commercial multilingual RAG where Apache-2.0 license is required and jina-v3's CC-BY-NC is a blocker

L1.25 enrichedVerdict

Multilingual E5 Large Instruct

Microsoft (intfloat)

Short-passage multilingual RAG with MIT license requirement and chunking pipeline already in place

L1.25 enrichedVerdict

Vikhr Qwen 2.5 0.5B Instruct

Russian-language mobile chatbot or on-device assistant

L1.25 enrichedVerdict

XTTS v2

Multilingual voice cloning from a short reference clip for personal or research use

L1.25 enrichedVerdict

Going deeper

Ecosystem maps — structured-landscape views (memory frameworks, inference runtimes, MCP, coding agents).
Execution stacks — recipes that combine models with runtimes + hardware.
Frontier index — broader ecosystem-momentum view across coding agents, inference runtimes, memory systems, MCP.
Benchmarks — measured tokens-per-second + topology fields across hardware/model/runtime triples.