315 models tracked. Hardware requirements, license terms, and quantization sizes for each.
Meta's small flagship. Strong general reasoning, 128K context, broad multilingual. The default first try for most local-AI use cases on consumer hardware.
Meta's 2026 flagship MoE model. 109B total parameters with only 17B active per forward pass and a record 10-million-token context window — unmatched in…
Late-2024 refresh of the 70B Llama line. Roughly matches Llama 3.1 405B on most benchmarks at one-fifth the parameter count. The default high-end model for…
Lightweight 3B for edge and laptop deployment. Runs comfortably on 8GB VRAM at 30+ tok/s on Apple Silicon.
The 70B sibling of Llama 3.1 8B. Strong generalist reasoning with 128K context, popular base for agentic fine-tunes (Hermes 3, Nemotron). Mostly superseded by…
NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.
First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.
TinyLlama-1.1B-Chat-v1.0 is a 1.1B Llama-2-architecture model pretrained on 3 trillion tokens and chat-tuned on UltraChat and UltraFeedback. It was one of the…
Meta's high-end Llama 4 sibling — 128 experts MoE built for performance over efficiency. Multilingual strength is its standout. Effectively a server-tier…
NVIDIA's top open reasoning model in the Llama 3.1 lineage. Server-tier; trained for groundbreaking reasoning accuracy on agentic workloads.
Smallest of the Nemotron reasoning trio. NAS-optimized for inference efficiency on RTX hardware.
True edge-tier Llama. Runs on a phone or Raspberry Pi. Useful for classification, simple summarization, and on-device agents.
Turkish-tuned chat model released by Trendyol, Turkey's largest e-commerce platform. Built on Llama 2 7B, fine-tuned on Turkish customer-service style…
Llama 3 8B continued pre-trained on Turkish corpora, then instruction-tuned for Turkish chat. YTU CE COSMOS group's most-downloaded Llama variant. GGUF builds…
The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.
YTU CE COSMOS's Llama 3 8B Turkish instruction-tuned variant. Follow-up to the original Turkish-Llama-8b that uses the Llama 3 base instead of Llama 2 — better…
Salamandra 7B Instruct is an Apache 2.0 instruction-tuned model from Barcelona Supercomputing Center, pretrained from scratch on 12.875 trillion tokens across…
Base (non-chat) variant of Trendyol's 7B Turkish LLM. The chat sibling is the more popular pick; this base version is for operators building their own…
LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning.…
SOLAR 10.7B is a base pretrained model from Upstage built by applying depth up-scaling (DUS) to Mistral 7B, pushing parameters to 10.7B without a traditional…
BSC-LT's 40B instruction-tuned model with first-class support for Spanish, Catalan, Basque, and Galician alongside English. Pretrained on 9.83 trillion tokens…
An 8B bilingual model from Japan's National Institute of Informatics, instruction-tuned via SFT on a Japanese/English corpus of 11.7T tokens. Supports up to…
Hermes 4 is a 70B reasoning model from NousResearch, built on Llama-3.1-70B with FP8 quantization to cut memory overhead. It supports explicit `<think>`…
RefinedNeuro RN TR R2 is an Apache-2.0 Llama-family 8B model distributed on Hugging Face and Ollama. It is measured alongside R1 to compare same-size…
RefinedNeuro RN TR R1 is an Apache-2.0 Llama-family 8B reasoning model distributed on Hugging Face and Ollama. It is included in the local sweep as a compact…
Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the…
Salamandra 2B Instruct is a transformer model from BSC pretrained from scratch on 12.875 trillion tokens across 35 European languages and code. The instruct…
An FP8-quantized build of Bielik-11B v3.0 Instruct, designed to run on vLLM or SGLang with roughly 50% less GPU memory than the BF16 original. Weights and…
Salamandra 2B is a base-only transformer trained from scratch by Barcelona Supercomputing Center on 12.875 trillion tokens across 35 European languages and…
A 7B Thai-language chat model built on LLaMA 2, pretrained on 65B+ Thai words and instruction-tuned on 1M+ Thai examples. Adds 10,000 common Thai vocabulary…
Bielik 11B v3.0 is SpeakLeash's instruction-tuned model built around Polish, with coverage across 32 European languages. It runs at 11B parameters with a 32K…
Saiga Llama3 8B is a Russian-language fine-tune of Meta's Llama 3 8B, trained on the Saiga Scored dataset and packaged in GGUF format for llama.cpp. It targets…
Salamandra 7B is a base language model from Barcelona Supercomputing Center, pretrained on 12.875 trillion tokens across 35 European languages and code. It is…
OpenThaiGPT 1.0.0 Beta is a 13B LLaMA v2 Chat fine-tune trained on translated Thai instructions. Vocabulary was expanded by 10,000+ Thai tokens to speed up…
Gervásio 8B PTPT is a LLaMA 3.1 8B Instruct fine-tune from PORTULAN/University of Lisbon, trained on Portuguese-specific datasets including extraGLUE-Instruct…
Meta's Llama 3.3 at 8B. Drop-in upgrade from Llama 3.1 8B; same hardware envelope, better instruction following.
Meta's dense flagship in the Llama 4 line. 405B params; comparable footprint to Llama 3.1 405B with the Llama 4 reasoning improvements.
Llama 3.2 multimodal at 11B. Consumer-tier multimodal predecessor to Llama 4 Scout.
Llama 4 dense at 70B. Drop-in successor to Llama 3.3 70B; same hardware envelope, better on reasoning benchmarks.
Phind's CodeLlama-derived coder at 34B. Older release; retained for historical / continuity value. Newer Qwen Coder lineage has surpassed it.
EVA community's storytelling-focused fine-tune of Llama 3.3 70B. Popular in the creative-writing / roleplay community.
Llama 3.2 multimodal at 90B. Datacenter-tier predecessor to Llama 4 Maverick. Strong visual reasoning.
Alibaba's May 2026 flagship. 397B total / 17B active MoE with hybrid thinking-mode toggle inherited from Qwen 3. Strongest open scientific reasoner per GPQA…
Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference…
Qwen3-0.6B is the smallest dense model in Alibaba's Qwen3 generation, supporting a 40K-token context and dual-mode operation that toggles between explicit…
Mid-tier Qwen 3 MoE. 30B total / 3B active means 70B-class quality at 7B-class inference speed on a single 24GB card. The sweet spot of the Qwen 3 lineup for…
Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24GB cards.
Dense Qwen 3 32B. Best dense open-weight model in its size class at release; pairs nicely with a single RTX 5090 or 4090.
Qwen 3 at the 8B scale. Direct head-to-head against Llama 3.1 8B on most benchmarks; usually wins on coding and structured output.
Qwen3-1.7B is the mid-tier dense model in Qwen3, sharing the same hybrid thinking architecture and 40K context as the 0.6B but with ~3x the parameters for…
14B Qwen 3. Fits on 12GB cards at Q4. Strong default for users with a single mid-range GPU.
Qwen2-VL 2B Instruct is Alibaba's compact vision-language model with native dynamic-resolution image handling and multimodal RoPE (M-RoPE) for video and…
The community-default small Qwen prior to Qwen 3. Still widely used because of mature ecosystem support.
14B Qwen 2.5. Sweet spot for 16GB VRAM. Many production deployments still on this version.
Qwen 3.6 35B-A3B with Multi-Token Prediction (MTP). The "A3B" suffix means ~3B activated parameters per token via Mixture-of-Experts — inference cost stays…
Dense 32B Qwen 2.5. Strong daily-driver on 24GB cards prior to Qwen 3 32B.
Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.
Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.
The flagship of Qwen 2.5. Workstation-tier; needs 48GB+ VRAM for usable inference.
Qwen 3.6 27B dense (not MoE) with Multi-Token Prediction. Sits between the 14B and 35B-A3B as a "single dense model with MTP throughput acceleration." Targets…
Qwen 3.5 2B base with supervised fine-tuning on Turkish instruction-following data. Recent community fine-tune (early 2026) that bridges Qwen 3.5's strong…
Continued pre-training of Qwen3.5-9B-Base on 68M+ tokens of Thai legal text — acts, decrees, and court rulings. This is a raw base model, not an assistant; you…
A 32B Japanese-English model built on Qwen3, trained with continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards.…
A 0.6B Qwen3 model fine-tuned on English-to-Hindi instruction pairs and quantized to GGUF. Fits in 370MB and runs on CPU-only hardware. Trained on 2,000…
Coding-specialized fine-tune of Qwen 3 32B. Curated coding corpus; outperforms Qwen 2.5 Coder 32B on SWE-Bench by ~6 points. Apache 2.0.
Qwen 2.5 vision-language flagship at 72B. Strong on document understanding + multi-image queries. Apache 2.0.
Qwen 2.5 fine-tuned for math problem-solving with chain-of-thought and tool-integrated reasoning.
Qwen 3 mid-tier. Same reasoning-mode toggle as Qwen 3 32B/14B/8B. Hits the consumer-laptop sweet spot.
Smallest Qwen 2.5. Apache 2.0; phone / Pi-class deployment target.
CodeQwen 1.5 — Qwen Coder predecessor. Superseded by Qwen 2.5 Coder for new deployments.
Coding-specialized Qwen 2.5 at 14B. The 16GB-VRAM tier coding model — fits comfortably with 8K context.
Qwen 2 vision-language predecessor to Qwen 2.5-VL. Apache 2.0 with strong document Q&A.
Compact Qwen 2.5 Coder. Sweet spot for laptop autocomplete and small refactor agents.
Smallest Qwen 2.5 Coder. Targets edge / autocomplete on integrated GPUs and Apple Silicon laptops.
Smallest Qwen 2.5-VL. Edge-deployable VLM with strong document Q&A.
Consumer-tier Qwen 2.5 VL. 7B + vision. Fits 8GB cards; the smallest practical multimodal Qwen.
Largest Qwen 2.5 Math. Datacenter-tier math specialist; eclipsed by R1 distills for general reasoning.
Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.
Compact Qwen 2.5. The 1.5B Apache-2.0 baseline.
Mid-edge Qwen 2.5. Note: 3B variant uses Qwen License (not Apache 2.0).
Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.
Mistral's April 2026 frontier MoE. 675B total / 41B active. Strong European-multilingual lineage carries through; the new release competes head-to-head with…
Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.
Joint Mistral/NVIDIA release with native 128K context and a new Tekken tokenizer. Strong multilingual; popular fine-tune base.
Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.
Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.
The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.
Mistral's flagship dense model. Open weights but restricted commercial license — research and non-commercial only.
Kumru 2B is a compact Turkish text-generation model from VNGRS. The Hugging Face config reports a Mistral-family architecture with an 8K context window, and…
Mistral 7B Instruct v0.2 is a 7-billion-parameter instruction-tuned model from Mistral AI with a 32,768-token context window. It uses `[INST]` prompt tags and…
Mistral AI's second instruct revision of their 7B model, bumping context from 8k to 32k tokens and updating the tokenizer to `mistral_common`. It's an…
Mistral 7B Instruct v0.1 is the instruction-tuned version of Mistral's first public 7B base model, fine-tuned on publicly available conversation datasets. It…
Turkcell LLM 7B v1 is an Apache-2.0 Turkish text-generation model built on a Mistral architecture. The measured Ollama artifact uses a RefinedNeuro GGUF…
Mistral 7B v0.2 continued-pretrained on Turkish data + instruction-tuned. The 32K context window makes it the best Turkish open-weight model for long-document…
Bielik 11B v2.3 Instruct is SpeakLeash's Polish-language instruction-tuned model, built on the Bielik-11B-v2 base and released under Apache 2.0. It targets…
An 11B Polish-language instruction model from SpeakLeash and ACK Cyfronet AGH, built as a linear merge of three instruct-tuned Bielik-11B-v2 variants. Uses…
Mistral Turkish v2 is a public Ollama-distributed Turkish Mistral variant. The upstream Hugging Face repository was not publicly accessible during intake, so…
Malhajar Mistral 7B Turkish is an Apache-2.0 Mistral 7B Instruct v0.2 Turkish fine-tune. The benchmarked Ollama tag is a koezgen quantized distribution of the…
Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode…
Bielik 7B Instruct v0.1 is a Polish-language instruction-tuned model from speakleash, fine-tuned from Bielik-7B-v0.1 and distributed in GGUF format for…
Mistral 7B fine-tuned on the OpenOrca instruction dataset, distributed by TheBloke in GGUF format for local CPU and GPU inference. Uses ChatML prompt…
A 7B instruction-tuned model from Stability AI built specifically for Japanese, using the Mistral architecture. Quantized to GGUF by TheBloke, so it runs on…
Bielik-7B v0.1 is a 7B-parameter base model built by continuously pretraining Mistral-7B on 70B+ tokens of Polish text, with data quality filtered via an…
Bielik 11B v2.2 Instruct is a Polish-language instruction-tuned model from speakleash, available in GGUF format for local inference. It supports 32,768-token…
Mistral's coding-specialized Mistral Small 2 successor. Apache 2.0 — the rare commercial-OK Mistral coder.
Iterative refresh of Mistral Small 3 24B. Same architecture; improved instruction following and tool-call reliability. Apache 2.0.
Mistral's Arabic and South Asian language specialist at 24B. Research license.
Mistral edge model at 3B. Designed for on-device inference with extended 128k context. Research license only.
Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.
Mistral's Mamba (state-space) architecture coding model. Linear inference cost — the architectural alternative to attention-based coding models. Apache 2.0.
Dense variant in the Mistral Medium 3.5 family. Research license — non-commercial open. Same training data as the MoE flagship but in a smaller dense package.
Mistral's reasoning-specialized fine-tune of a Mistral Small base. Reasoning-token emission similar to Qwen 3 / DeepSeek R1 in a smaller footprint. Research…
DeepSeek's April 2026 frontier flagship. 1.6T total / 49B active MoE with hybrid Compressed Sparse Attention + Heavily Compressed Attention. 1M context window.…
Open reasoning model that closed the gap with frontier proprietary reasoners. Visible chain-of-thought, MIT license, and a family of distilled smaller variants.
The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it…
Reasoning distillation onto Llama 3.3 70B. Best-in-class open-weight reasoner you can actually fit on a workstation.
32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.
DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.
Smallest practical R1 distill. Reasoning on a 6GB GPU.
14B reasoning distill. Fits on 12GB cards.
MoE coding specialist — 16B total / 2.4B active. Fast on 12GB cards.
DeepSeek-V2-Lite-Chat is a 15.7B-total, 2.4B-active mixture-of-experts chat model using DeepSeek's Multi-head Latent Attention (MLA) architecture for KV-cache…
Full DeepSeek Coder V2. 236B total / 21B active MoE coder.
R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but…
Smallest R1 distill. Surprisingly capable reasoning at 1.5B for its size class; right pick when you need reasoning AND edge deployment.
Newer R1 distill on a Qwen 3 base. Combines R1 reasoning with Qwen 3's reasoning-toggle architecture. Apache 2.0.
DeepSeek V2.5 — merged V2 chat + Coder. Pre-V3 baseline; 21B active MoE.
DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source…
Distillation of DeepSeek V3 to a smaller MoE. 16B total / 2.4B active. Captures most of V3's reasoning at consumer-card-friendly memory.
Community R1 distill onto a Mistral Small 3 base. Apache 2.0; combines R1 reasoning with Mistral instruction polish.
DeepSeek's first MoE — 16B / 2.4B active. Older model retained for ecosystem-context value as the base of the V2/V3 lineage.
DeepSeek's coder line successor. Dense 33B; competitive with Qwen 2.5 Coder 32B on SWE-Bench.
Google's flagship dense Gemma 4. Beats some 400B-class proprietary models on benchmarks. Targets the 24GB single-GPU sweet spot.
MoE variant of Gemma 4. Faster per-token than the 31B dense at similar quality on most tasks.
Gemma 3 270M is the smallest member of Google's Gemma 3 family, a 270-million-parameter text-only model designed for on-device deployment and task-specific…
Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.
Edge-class Gemma 4. The 'Effective 4B' branding signals it punches above its parameter count via training-data quality.
12B Gemma 3. Fits on 12GB consumer cards. Multimodal.
Trendyol LLM Asure 12B is a Gemma 3 based multimodal instruct model for Turkish and English business workflows. The public Ollama build used in local testing…
YTU's Turkish-tuned Gemma 2 9B model. The highest community-rated Turkish-language LLM on Hugging Face by likes-to-downloads ratio as of May 2026. Continued…
Gemma 2 2B Instruct is Google's instruction-tuned 2B model from the Gemma 2 generation, trained with knowledge distillation from larger Gemma models. It…
4B Gemma 3 for edge. Multimodal.
Mid-size Gemma 2. Strong chat quality with a different training mix from Llama family.
Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.
Gemma 4 26B MoE (4B active params) pruned and Turkish-tuned. The largest Turkish-tuned open-weight model on HF as of May 2026. MoE architecture means it loads…
YTU Turkish Gemma 9B v0.1 is a Gemma 2 based Turkish instruction model from the YTU CE COSMOS ecosystem. The benchmarked Ollama tag is an alibayram GGUF…
Smallest text-only Gemma 3 for phones and IoT.
Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.
Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.
3B-parameter visual document retriever built on PaliGemma-3B using a ColBERT-style late-interaction objective. Encodes a PDF page as a grid of patch…
Mid-tier PaliGemma 2 fine-tuning base. Better baseline for complex vision tasks.
PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.
Microsoft's Phi-4 14B trained on synthetic textbook-quality data. Punches above weight on reasoning and math; MIT licensed.
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
Compact 3.8B Phi for edge deployment. 128K context. Strong reasoning per parameter.
Multimodal Phi 3.5. Document and chart understanding at edge size. MIT licensed.
Multimodal variant of Phi-4 14B. Vision + text. Smaller than Llama 4 Scout but covers most image-Q&A workflows; right-sized for 16GB consumer cards.
Microsoft's edge-tier Phi-4 variant. 3.8B params; designed for phone / tablet / Pi deployment. Strong reasoning per parameter — Phi family's traditional…
Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.
EXAONE 3.5 2.4B Instruct is LG AI Research's bilingual English/Korean model built for low-resource devices. It handles up to 32K context tokens and shows…
K-EXAONE is LG AI Research's 236B Mixture-of-Experts model with 23B active parameters per forward pass. It covers Korean, English, Spanish, German, Japanese,…
EXAONE 4.0.1 is a 32B model from LG AI Research with a 131K context window and a hybrid sliding-window/full-attention architecture. It runs in either standard…
Smaller EXAONE for consumer-tier Korean / CJK workloads.
LG AI Research's flagship Korean-ecosystem model. Strong on Korean/Japanese language tasks; competitive on English. License blocks commercial use without LG…
LG AI's edge-tier EXAONE. Strong Korean / English. Research-only license.
Granite 3.1 2B Instruct is IBM's 2B-parameter dense instruct model with a 128K context window, post-trained for enterprise tasks including RAG, function…
IBM Granite 3.3. Iterative refresh of 3.2 — same architecture; improved instruction following and tool-call reliability. Apache 2.0.
IBM Granite at 2B. Apache 2.0 enterprise-friendly small model with safety tuning.
Granite 3.0 8B — IBM's enterprise-tier baseline. Apache 2.0.
IBM's enterprise-tuned 8B. Apache 2.0. Strong on enterprise-shaped tool-calling and structured output. Watson + RHEL ecosystem alignment.
Granite MoE shape. 16B total / 3B active. Workstation-deployable; the IBM enterprise alternative to Qwen / DeepSeek small MoEs.
Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.
Cohere's mid-tier — RAG and tool use. Non-commercial license.
Command R7B (December 2024) is Cohere's smallest model in the Command R family, an 8B-parameter dense transformer with 128K context, trained for…
Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.
Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya…
Falcon-40B-Instruct is a 40B parameter instruction-tuned model from TII (UAE), fine-tuned on Baize chat data for conversation and instruction-following. It…
Falcon 3 3B Instruct is TII's 3-billion-parameter instruct model from the Falcon 3 family, supporting English, French, Spanish, and Portuguese with a 32K…
Falcon 3 mid-size from TII. Permissive Falcon license; multilingual focus.
TII's Falcon 3 at the 10B tier. Strong on Arabic-language tasks; competitive on English.
TII's Mamba (state-space) architecture model. Linear inference cost; the architectural alternative to attention-based models.
NousResearch's Hermes fine-tune of Llama 3.1 8B. Stronger system-prompt adherence, JSON output, role-play, and agent steering than the base Llama.
Hermes 3 at 70B. Workstation-tier agent-tuned model.
Nous Research's Hermes 3 fine-tune of Llama 3.2 3B. Strong general-instruction following at the 3B tier.
Nous Research's Hermes 4 fine-tune of Llama 3.3 70B. Strong on instruction following and creative tasks; community-favored alternative to base Llama.
Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.
Eric Hartford's Dolphin fine-tune at 3B. Less-censored than the base Llama; popular for unconstrained-generation use cases.
Eric Hartford's Dolphin 3 at 70B Llama 3.3 base. Less-restricted alternative for creative / unconstrained workflows.
The MoE model that introduced the 8-experts pattern to the open-weight world. 47B params total, 13B active. Still a viable workhorse on 36GB+ setups.
The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.
GPTQ 4-bit quantized build of Mistral AI's Mixtral 8x7B Instruct, a sparse mixture-of-experts model with 46.7B total parameters. Natively handles German,…
GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.
Zhipu's GLM-4 at 9B. Strong on Chinese-language tasks; tool-calling format slightly different from OpenAI convention.
Zhipu's GLM-5 flagship. 144B total / 16B active MoE. Strong on Chinese-language tasks; competitive on English at the workstation-cluster tier.
MiniCPM-V successor. Multimodal at 8B with stronger document Q&A than 2.6.
OpenBMB's edge-optimized 4B. MIT license; designed for phone deployment. Strong reasoning per parameter.
Multimodal MiniCPM at 8B. Vision + text; strong on document Q&A for the size class.
580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX…
StepFun's 1T-parameter MoE. 38B active. One of the largest open-weight models; cluster-only at any quant. Restricted license.
OLMo 2 1B Instruct is AllenAI's 1-billion-parameter instruct model from the April 2025 OLMo 2 release, post-trained with RLVR on math. It is fully open:…
AI2's fully-open 13B. Apache 2.0; full training data + checkpoints + recipes published. The reproducibility-first model in the 13B class.
all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM…
12B-parameter rectified-flow transformer for text-to-image, guidance-distilled from the FLUX.1 [pro] teacher. Currently the most-liked model on Hugging Face…
Nomic Embed Text v1.5 is a 137M-parameter English embedding model with an 8192-token context window, trained with Matryoshka Representation Learning so the…
82M-parameter StyleTTS2-derived TTS that went viral in early 2025 for matching billion-parameter TTS quality at ~1% the size. Apache-2.0 weights, dozens of…
BGE Large EN v1.5 is the 335M-parameter English flagship from BAAI's FlagEmbedding family, producing 1024-dim embeddings with a 512-token context window.…
BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.
all-mpnet-base-v2 is a 109M-parameter sentence-transformers embedder based on Microsoft's MPNet, producing 768-dim vectors with a 384-token context. Trained on…
Coqui's flagship multilingual voice-cloning TTS — clones a speaker from a 6-second reference clip and synthesizes in 17 languages with cross-lingual transfer.…
74M-parameter Whisper variant — roughly 2x the params of tiny for ~25-30% relative WER reduction. The standard pick for CPU realtime transcription with…
244M-parameter Whisper. The smallest Whisper checkpoint considered 'production grade' for non-English audio. Sweet spot for laptops with iGPU/Metal or modest…
Smallest member of the Whisper encoder-decoder ASR family (39M params). Trained on 680k hours of weakly supervised multilingual audio. Targets sub-realtime…
paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing…
Zhipu's GLM-5 currently leads the Open LLM Leaderboard 2026. Strong reasoning and bilingual EN/ZH capability.
mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation…
Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage,…
Multilingual E5 Large Instruct is a 560M-parameter XLM-RoBERTa-large encoder fine-tuned by Microsoft's intfloat team with task instructions appended to…
12B rectified-flow transformer, timestep-distilled to 1-4 sampling steps, released under Apache-2.0. Same architecture as FLUX.1 [dev] but trades a bit of…
NVIDIA's hybrid Mamba-2 + Transformer MoE for on-device agents. 30B total / 3B active. 1M-token context window with reasoning ON/OFF modes and 4× faster…
Moonshot's long-context, agent-oriented MoE. Optimized for stability under tool use and multi-step coding/planning workflows.
428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1…
Workstation-tier Nemotron 3. 120B total / 12B active. 5× higher throughput than the prior Super, 1M context, designed for multi-agent applications.
756M-param distilled Whisper-large-v3 with the decoder shrunk from 32 to 2 layers. ~6.3x faster than the teacher at near-parity WER on long-form English (1%…
Arctic Embed L v2.0 is a 568M-parameter multilingual embedder from Snowflake based on XLM-RoBERTa, producing 1024-dim Matryoshka vectors with an 8192-token…
2.6B SDXL backbone trained with Adversarial Diffusion Distillation (ADD), producing photorealistic 512px images in a single forward pass. Designed for…
Jina Reranker v2 Base Multilingual is a 278M-parameter cross-encoder from Jina AI with a 1024-token context, trained on 100+ languages plus code and structured…
SmolLM2-135M-Instruct is the smallest instruction-tuned model in Hugging Face's SmolLM2 family, a 135M-parameter Llama-architecture model trained for on-device…
Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.
770M-parameter unified vision foundation model with a DaViT image encoder and BART-style seq2seq decoder. One model, one set of weights — handles captioning,…
GTE ModernBERT Base is a 149M-parameter English embedder built on AnswerDotAI's ModernBERT backbone, producing 768-dim vectors with native 8192-token context…
E5-Mistral-7B-Instruct is a 7.11B-parameter decoder-based embedder fine-tuned from Mistral-7B by Microsoft's intfloat team, producing 4096-dim embeddings with…
31B-parameter Turkish-tuned reasoning model with i1-imatrix quantizations by mradermacher. Designed for step-by-step problem solving in Turkish. Highest…
2.5B MMDiT-X with improved Querying Key Normalization and dual attention blocks at lower resolutions. Trained for 0.25-2MP output. Positioned as the mid-tier…
EXAONE Deep 7.8B is LG AI Research's reasoning-focused model, fine-tuned from EXAONE-3.5-7.8B-Instruct for math and coding tasks. It claims benchmark wins over…
TinyLlama 1.1B Chat v0.3 is a 1.1B-parameter chat model quantized to 4-bit AWQ by TheBloke. It uses the ChatML prompt format and fits comfortably in very low…
GPTQ-quantized build of TinyLlama 1.1B Chat v0.3, trained on SlimPajama, StarCoder, and OpenAssistant data. Runs in roughly 0.8 GB VRAM thanks to 4-bit…
VITS-based neural TTS optimized for Raspberry Pi-class hardware. Ships as ONNX checkpoints with ~100 voices across 30+ languages. Powers Home Assistant's local…
InclusionAI's Ring-2.6-1T is a 1 trillion parameter Mixture-of-Experts model with ~32B activated parameters per token, released by Ant Group's AI research arm.…
mxbai-rerank-large-v2 is a 1.54B-parameter listwise reranker from Mixedbread AI built on Qwen2.5-1.5B, supporting 100+ languages and a 32K-token context with…
GPT-NeoX-20B is a 20B-parameter English autoregressive model from EleutherAI, trained on the 825 GiB Pile dataset. It uses a GPT-3-style transformer…
A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both…
EXAONE 3.5 7.8B is LG AI Research's instruction-tuned bilingual model for English and Korean, with a 32K token context window. It succeeds EXAONE 3.0 with…
35B MoE (3B active) tuned specifically for Turkish financial-services text — bank statements, investment research, accounting terminology. Niche-cluster model…
600M-parameter FastConformer-TDT transducer ASR from NVIDIA NeMo. Topped the Hugging Face Open ASR Leaderboard in 2025 for English, with WER ~6.05% averaged…
Turkish-from-scratch language model trained by Ali Safaya (Koç University researcher). Named after the kanarya (Turkish for 'canary'). Trained on 250+ GB of…
SmolLM2-360M-Instruct is the middle tier of the SmolLM2 instruct family, a 360M-parameter Llama-architecture model with an 8K context. It is shipped with ONNX…
Flow-matching non-autoregressive TTS built on a Diffusion Transformer (DiT) backbone with ConvNeXt text refinement. Trained on the 100K-hour Emilia dataset;…
Turkish BART-style sequence-to-sequence model fine-tuned specifically for summarization. Not a chat model — purpose-built for input-document → Turkish-summary…
Smaller Kanarya variant — 750M parameters. Runs on CPU or 4GB GPU comfortably. Useful for low-resource Turkish text classification, embeddings, or completion…
GPT-2 Large architecture trained from scratch on Turkish. Reference baseline for measuring how much modern instruction-tuned models actually improve on the…
SmolVLM-Instruct is Hugging Face's compact vision-language model built on the Idefics3 architecture, pairing SmolLM2-1.7B-Instruct with a SigLIP-SO400M vision…
Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size…
LLaMA-architecture 3B model fine-tuned as a TTS that emits SNAC audio tokens. Designed for highly expressive, emotion-controllable speech with laughter, sighs,…
EXAONE 3.5 32B Instruct is LG AI Research's 32B bilingual model, trained for instruction-following in English and Korean. It supports a 32,768-token context…
A 12B GPT-NeoX model from Merlyn Mind, fine-tuned specifically to refuse or soften unsafe content in K-12 and higher-education contexts. Delivered in AWQ 4-bit…
EXAONE 3.5 32B Instruct is LG AI Research's bilingual English/Korean instruction model, quantized to 4-bit AWQ for lower VRAM overhead. It supports a 32K…
Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks…
A 20B bilingual model from TokyoTech built on GPT-OSS via continual pre-training, SFT, and reinforcement learning with verifiable rewards (RLVR). Targets…
A 32B MoE model from Japan's National Institute of Informatics, with only 3B parameters active per forward pass. Trained on 11.7T tokens across four stages:…
A 124M-parameter GPT-2 base model trained on French Wikipedia (wiki40b/fr) and a CC-100/fr subset, with a 50,000-token BPE vocabulary. It generates French text…
GPT-2 Spanish is a 124M-parameter model trained from scratch on 11.5GB of Spanish text (Wikipedia, books) with a custom Spanish BPE tokenizer. It generates…
mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and…
PhoGPT-4B-Chat is VinAI's 3.7B-parameter Vietnamese chat model, fine-tuned from a base trained on 102B Vietnamese tokens. It handles up to 8192-token contexts…
A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a…
A 355M-parameter GPT-2 Medium trained from scratch on 11.5 GB of Spanish text (Wikipedia and books), with a BPE tokenizer built specifically for Spanish.…
OpenELM-3B-Instruct is Apple's 3-billion-parameter instruct model using a layer-wise scaled transformer with varying FFN multipliers and KV-head counts across…
A 1.3B-parameter GPT-2-style model fine-tuned on Uzbek text for 50,000 steps on a single A100. Covers Uzbek, Russian, and English generation. It is a base…
PhoGPT-4B is a 3.7B-parameter model pre-trained from scratch on 102B Vietnamese tokens, making it one of the few Vietnamese-first generative models available.…
A 175M-parameter GPT-2 model fine-tuned on Dostoevsky's digitized works, built on top of ruGPT3-small. Trained for five epochs, it generates Russian prose in a…
A 1.3B-parameter GPT model fine-tuned from ai-forever's mGPT base for Mongolian, with English and Russian also supported. Fine-tuning ran for 50,000 steps on…
A 0.5B Russian-language instruct model fine-tuned from Qwen2.5-0.5B on the GrandMaster-PRO-MAX dataset (~150k instructions). Vikhrmodels claims 4x efficiency…
OpenThaiGPT 1.5 7B is a Thai-language chat model fine-tuned from Qwen2.5 on over 2 million Thai instruction pairs. It targets Thai academic benchmarks and…
An instruction-tuned 8B Thai language model from typhoon-ai, built on ThaiLLM using supervised fine-tuning and on-policy distillation. Training ran on a single…
Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages…
SmolLM 2 flagship. Open data + open weights at the edge tier.
NVIDIA's edge-tier Nemotron. Distilled from Minitron lineage with role-play tuning.
Mid-size StarCoder 2. The 8GB-VRAM autocomplete pick.
OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.
Molmo flagship. Apache 2.0 VLM rivaling proprietary models on UI pointing and visual reasoning.
LLaVA 1.6 on Mistral 7B base. Apache 2.0 vision-language with strong OCR.
StarCoder 2 flagship. The largest BigCode coder; 16k context with strong fill-in-middle.
BigCode's StarCoder 2 at 3B. Trained on The Stack v2 with 600+ programming languages.
Cohere's multilingual research model covering 23 languages. CC-BY-NC — research only.
AI21's hybrid Mamba-Transformer MoE. 256k context with the SSM throughput advantage.
Tulu 3 at 70B. AI2's fully-open instruct fine-tune — research transparency at scale.
Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.
BAAI's multilingual embedding flagship. Dense + sparse + ColBERT-style multi-vector. The de-facto open multilingual embedding pick.
NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.
LLaVA-OneVision unified single-image / multi-image / video VLM on Qwen 2 base.
Distilled Whisper Large v3. ~8x faster decode at near-equivalent accuracy on most languages.
HuggingFace's small-model line at 3B. Apache 2.0. Designed for edge / educational deployments.
Tiny vision-language model. ~1.9B; designed for edge / embedded multimodal use cases. Apache 2.0.
Jamba flagship at 398B total / 94B active. Frontier hybrid-architecture model with 256k context.
NVIDIA's Nemotron 3 at 9B. Tuned for NVIDIA-stack deployment patterns; strong tool-calling reliability.
AI2's fully-open VLM. Trained on PixMo dataset; pointing capability for UI grounding.
AI2's fully-open post-training recipe applied to Llama 3.1 8B. Open data, open code, open weights.
Stability AI's 12B. Stable LM line; commercial use requires paid membership. Solid baseline at 12B class.
InternVL 2.5 flagship. Approaches frontier proprietary VLMs on document and OCR tasks.
Aya 23 at 35B. Built on Cohere's Command-R lineage. Non-commercial.
Nemotron 3 mid-tier. 49B dense; fits 32GB cards with AWQ. NVIDIA stack alignment carries through.
InternVL 2.5 mid-tier — Shanghai AI Lab vision-language model with strong document and chart understanding.