RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
BLK · MODEL REGISTRY

Open-weight models

315 models tracked. Hardware requirements, license terms, and quantization sizes for each.

ALSO →Browse prompting kits (system prompts, chat templates, tool-call formats)
Check if a model runs on your GPU·Find the right GPU for the models you want
FAM · LLAMA

Llama

42 models
Llama 3.1 8B Instruct
8B

Meta's small flagship. Strong general reasoning, 128K context, broad multilingual. The default first try for most local-AI use cases on consumer hardware.

COMMERCIAL OK·131K CTX
Llama 4 Scout
109B

Meta's 2026 flagship MoE model. 109B total parameters with only 17B active per forward pass and a record 10-million-token context window — unmatched in…

COMMERCIAL OK·10M CTX
Llama 3.3 70B Instruct
70B

Late-2024 refresh of the 70B Llama line. Roughly matches Llama 3.1 405B on most benchmarks at one-fifth the parameter count. The default high-end model for…

COMMERCIAL OK·131K CTX
Llama 3.2 3B Instruct
3B

Lightweight 3B for edge and laptop deployment. Runs comfortably on 8GB VRAM at 30+ tok/s on Apple Silicon.

COMMERCIAL OK·131K CTX
Llama 3.1 70B Instruct
70B

The 70B sibling of Llama 3.1 8B. Strong generalist reasoning with 128K context, popular base for agentic fine-tunes (Hermes 3, Nemotron). Mostly superseded by…

COMMERCIAL OK·131K CTX
Llama 3.1 Nemotron 70B Instruct
70B

NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.

COMMERCIAL OK·131K CTX
Llama 3.2 11B Vision Instruct
11B

First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.

COMMERCIAL OK·MULTIMODAL·131K CTX
TinyLlama 1.1B Chat v1.0
1.1B

TinyLlama-1.1B-Chat-v1.0 is a 1.1B Llama-2-architecture model pretrained on 3 trillion tokens and chat-tuned on UltraChat and UltraFeedback. It was one of the…

COMMERCIAL OK·2K CTX
Llama 4 Maverick
400B

Meta's high-end Llama 4 sibling — 128 experts MoE built for performance over efficiency. Multilingual strength is its standout. Effectively a server-tier…

COMMERCIAL OK·MULTIMODAL·1M CTX
Llama 3.1 Nemotron Ultra 253B
253B

NVIDIA's top open reasoning model in the Llama 3.1 lineage. Server-tier; trained for groundbreaking reasoning accuracy on agentic workloads.

COMMERCIAL OK·131K CTX
Llama 3.1 Nemotron Nano 8B
8B

Smallest of the Nemotron reasoning trio. NAS-optimized for inference efficiency on RTX hardware.

COMMERCIAL OK·131K CTX
Llama 3.2 1B Instruct
1B

True edge-tier Llama. Runs on a phone or Raspberry Pi. Useful for classification, simple summarization, and on-device agents.

COMMERCIAL OK·131K CTX
Trendyol LLM 7B Chat v0.1
7B

Turkish-tuned chat model released by Trendyol, Turkey's largest e-commerce platform. Built on Llama 2 7B, fine-tuned on Turkish customer-service style…

RESTRICTED·4K CTX
Turkish Llama 8B Instruct v0.1
8B

Llama 3 8B continued pre-trained on Turkish corpora, then instruction-tuned for Turkish chat. YTU CE COSMOS group's most-downloaded Llama variant. GGUF builds…

COMMERCIAL OK·8K CTX
Llama 3.2 90B Vision Instruct
90B

The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.

COMMERCIAL OK·MULTIMODAL·131K CTX
Cosmos Llama 3 8B Turkish
8B

YTU CE COSMOS's Llama 3 8B Turkish instruction-tuned variant. Follow-up to the original Turkish-Llama-8b that uses the Llama 3 base instead of Llama 2 — better…

COMMERCIAL OK·8K CTX
Salamandra 7B Instruct
7B

Salamandra 7B Instruct is an Apache 2.0 instruction-tuned model from Barcelona Supercomputing Center, pretrained from scratch on 12.875 trillion tokens across…

COMMERCIAL OK·8K CTX
Trendyol LLM 7B Base v0.1
7B

Base (non-chat) variant of Trendyol's 7B Turkish LLM. The chat sibling is the more popular pick; this base version is for operators building their own…

RESTRICTED·4K CTX
LLM-jp 4 8B Thinking
8B

LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning.…

COMMERCIAL OK·66K CTX
SOLAR 10.7B v1.0
10.7B

SOLAR 10.7B is a base pretrained model from Upstage built by applying depth up-scaling (DUS) to Mistral 7B, pushing parameters to 10.7B without a traditional…

COMMERCIAL OK·4K CTX
ALIA 40b instruct 2601
40B

BSC-LT's 40B instruction-tuned model with first-class support for Spanish, Catalan, Basque, and Galician alongside English. Pretrained on 9.83 trillion tokens…

COMMERCIAL OK·164K CTX
LLM-jp 4 8B Instruct
8B

An 8B bilingual model from Japan's National Institute of Informatics, instruction-tuned via SFT on a Japanese/English corpus of 11.7T tokens. Supports up to…

COMMERCIAL OK·66K CTX
Hermes 4 70B FP8
70B

Hermes 4 is a 70B reasoning model from NousResearch, built on Llama-3.1-70B with FP8 quantization to cut memory overhead. It supports explicit `<think>`…

COMMERCIAL OK·128K CTX
RefinedNeuro RN TR R2
8B

RefinedNeuro RN TR R2 is an Apache-2.0 Llama-family 8B model distributed on Hugging Face and Ollama. It is measured alongside R1 to compare same-size…

COMMERCIAL OK·8K CTX
RefinedNeuro RN TR R1
8B

RefinedNeuro RN TR R1 is an Apache-2.0 Llama-family 8B reasoning model distributed on Hugging Face and Ollama. It is included in the local sweep as a compact…

COMMERCIAL OK·8K CTX
Swallow 7B
7B

Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the…

RESTRICTED·4K CTX
Salamandra 2B Instruct
2B

Salamandra 2B Instruct is a transformer model from BSC pretrained from scratch on 12.875 trillion tokens across 35 European languages and code. The instruct…

COMMERCIAL OK·8K CTX
Bielik-11B v3.0 Instruct FP8 Dynamic
11B

An FP8-quantized build of Bielik-11B v3.0 Instruct, designed to run on vLLM or SGLang with roughly 50% less GPU memory than the BF16 original. Weights and…

COMMERCIAL OK·4K CTX
Salamandra 2B
2.25B

Salamandra 2B is a base-only transformer trained from scratch by Barcelona Supercomputing Center on 12.875 trillion tokens across 35 European languages and…

COMMERCIAL OK·8K CTX
OpenThaiGPT 7B 1.0.0 Chat
7B

A 7B Thai-language chat model built on LLaMA 2, pretrained on 65B+ Thai words and instruction-tuned on 1M+ Thai examples. Adds 10,000 common Thai vocabulary…

COMMERCIAL OK·4K CTX
Bielik 11B v3.0 Instruct GGUF
11B

Bielik 11B v3.0 is SpeakLeash's instruction-tuned model built around Polish, with coverage across 32 European languages. It runs at 11B parameters with a 32K…

COMMERCIAL OK·33K CTX
Saiga Llama3 8B GGUF
8B

Saiga Llama3 8B is a Russian-language fine-tune of Meta's Llama 3 8B, trained on the Saiga Scored dataset and packaged in GGUF format for llama.cpp. It targets…

RESTRICTED·8K CTX
Salamandra 7B
7B

Salamandra 7B is a base language model from Barcelona Supercomputing Center, pretrained on 12.875 trillion tokens across 35 European languages and code. It is…

COMMERCIAL OK·8K CTX
OpenThaiGPT 1.0.0 Beta 13B Chat
13B

OpenThaiGPT 1.0.0 Beta is a 13B LLaMA v2 Chat fine-tune trained on translated Thai instructions. Vocabulary was expanded by 10,000+ Thai tokens to speed up…

COMMERCIAL OK·4K CTX
Gervásio 8B PTPT
8B

Gervásio 8B PTPT is a LLaMA 3.1 8B Instruct fine-tune from PORTULAN/University of Lisbon, trained on Portuguese-specific datasets including extraGLUE-Instruct…

COMMERCIAL OK·4K CTX
Llama 3.3 8B Instruct
8B

Meta's Llama 3.3 at 8B. Drop-in upgrade from Llama 3.1 8B; same hardware envelope, better instruction following.

COMMERCIAL OK·131K CTX
Llama 4 405B
405B

Meta's dense flagship in the Llama 4 line. 405B params; comparable footprint to Llama 3.1 405B with the Llama 4 reasoning improvements.

COMMERCIAL OK·131K CTX
Llama 3.2 11B Vision
11B

Llama 3.2 multimodal at 11B. Consumer-tier multimodal predecessor to Llama 4 Scout.

COMMERCIAL OK·MULTIMODAL·131K CTX
Llama 4 70B
70B

Llama 4 dense at 70B. Drop-in successor to Llama 3.3 70B; same hardware envelope, better on reasoning benchmarks.

COMMERCIAL OK·131K CTX
Phind CodeLlama 34B v2
34B

Phind's CodeLlama-derived coder at 34B. Older release; retained for historical / continuity value. Newer Qwen Coder lineage has surpassed it.

COMMERCIAL OK·16K CTX
EVA Llama 3.3 70B
70B

EVA community's storytelling-focused fine-tune of Llama 3.3 70B. Popular in the creative-writing / roleplay community.

COMMERCIAL OK·131K CTX
Llama 3.2 90B Vision
90B

Llama 3.2 multimodal at 90B. Datacenter-tier predecessor to Llama 4 Maverick. Strong visual reasoning.

COMMERCIAL OK·MULTIMODAL·131K CTX
FAM · QWEN

Qwen

39 models
Qwen 3.5 235B-A17B (MoE)
397B

Alibaba's May 2026 flagship. 397B total / 17B active MoE with hybrid thinking-mode toggle inherited from Qwen 3. Strongest open scientific reasoner per GPQA…

COMMERCIAL OK·262K CTX
Qwen 3 235B-A22B
235B

Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference…

COMMERCIAL OK·131K CTX
Qwen 3 0.6B
0.6B

Qwen3-0.6B is the smallest dense model in Alibaba's Qwen3 generation, supporting a 40K-token context and dual-mode operation that toggles between explicit…

COMMERCIAL OK·41K CTX
Qwen 3 30B-A3B
30B

Mid-tier Qwen 3 MoE. 30B total / 3B active means 70B-class quality at 7B-class inference speed on a single 24GB card. The sweet spot of the Qwen 3 lineup for…

COMMERCIAL OK·131K CTX
Qwen 2.5 Coder 32B Instruct
32B

Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24GB cards.

COMMERCIAL OK·131K CTX
Qwen 3 32B
32B

Dense Qwen 3 32B. Best dense open-weight model in its size class at release; pairs nicely with a single RTX 5090 or 4090.

COMMERCIAL OK·131K CTX
Qwen 3 8B
8B

Qwen 3 at the 8B scale. Direct head-to-head against Llama 3.1 8B on most benchmarks; usually wins on coding and structured output.

COMMERCIAL OK·131K CTX
Qwen 3 1.7B
1.7B

Qwen3-1.7B is the mid-tier dense model in Qwen3, sharing the same hybrid thinking architecture and 40K context as the 0.6B but with ~3x the parameters for…

COMMERCIAL OK·41K CTX
Qwen 3 14B
14B

14B Qwen 3. Fits on 12GB cards at Q4. Strong default for users with a single mid-range GPU.

COMMERCIAL OK·131K CTX
Qwen2-VL 2B Instruct
2B

Qwen2-VL 2B Instruct is Alibaba's compact vision-language model with native dynamic-resolution image handling and multimodal RoPE (M-RoPE) for video and…

COMMERCIAL OK·33K CTX
Qwen 2.5 7B Instruct
7B

The community-default small Qwen prior to Qwen 3. Still widely used because of mature ecosystem support.

COMMERCIAL OK·131K CTX
Qwen 2.5 14B Instruct
14B

14B Qwen 2.5. Sweet spot for 16GB VRAM. Many production deployments still on this version.

COMMERCIAL OK·131K CTX
Qwen 3.6 35B-A3B (MTP)
35B

Qwen 3.6 35B-A3B with Multi-Token Prediction (MTP). The "A3B" suffix means ~3B activated parameters per token via Mixture-of-Experts — inference cost stays…

COMMERCIAL OK·262K CTX
Qwen 2.5 32B Instruct
32B

Dense 32B Qwen 2.5. Strong daily-driver on 24GB cards prior to Qwen 3 32B.

COMMERCIAL OK·131K CTX
QwQ 32B Preview
32B

Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.

COMMERCIAL OK·33K CTX
Qwen 3 4B
4B

Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.

COMMERCIAL OK·131K CTX
Qwen 2.5 72B Instruct
72B

The flagship of Qwen 2.5. Workstation-tier; needs 48GB+ VRAM for usable inference.

COMMERCIAL OK·131K CTX
Qwen 3.6 27B (MTP)
27B

Qwen 3.6 27B dense (not MoE) with Multi-Token Prediction. Sits between the 14B and 35B-A3B as a "single dense model with MTP throughput acceleration." Targets…

COMMERCIAL OK·131K CTX
Qwen 3.5 2B Turkish SFT
2B

Qwen 3.5 2B base with supervised fine-tuning on Turkish instruction-following data. Recent community fine-tune (early 2026) that bridges Qwen 3.5's strong…

COMMERCIAL OK·33K CTX
Qwen3.5 9B Thai Law Base
8.95B

Continued pre-training of Qwen3.5-9B-Base on 68M+ tokens of Thai legal text — acts, decrees, and court rulings. This is a raw base model, not an assistant; you…

COMMERCIAL OK·4K CTX
Qwen3 Swallow 32B RL v0.2
32B

A 32B Japanese-English model built on Qwen3, trained with continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards.…

COMMERCIAL OK·33K CTX
Qwen3 0.6B Hindi Instruct v1 GGUF
0.6B

A 0.6B Qwen3 model fine-tuned on English-to-Hindi instruction pairs and quantized to GGUF. Fits in 370MB and runs on CPU-only hardware. Trained on 2,000…

COMMERCIAL OK·2K CTX
Qwen 3 Coder 32B
32B

Coding-specialized fine-tune of Qwen 3 32B. Curated coding corpus; outperforms Qwen 2.5 Coder 32B on SWE-Bench by ~6 points. Apache 2.0.

COMMERCIAL OK·131K CTX
Qwen 2.5-VL 72B
72B

Qwen 2.5 vision-language flagship at 72B. Strong on document understanding + multi-image queries. Apache 2.0.

COMMERCIAL OK·MULTIMODAL·33K CTX
Qwen 2.5 Math 7B
7B

Qwen 2.5 fine-tuned for math problem-solving with chain-of-thought and tool-integrated reasoning.

COMMERCIAL OK·4K CTX
Qwen 3 7B
7B

Qwen 3 mid-tier. Same reasoning-mode toggle as Qwen 3 32B/14B/8B. Hits the consumer-laptop sweet spot.

COMMERCIAL OK·131K CTX
Qwen 2.5 0.5B Instruct
0.5B

Smallest Qwen 2.5. Apache 2.0; phone / Pi-class deployment target.

COMMERCIAL OK·33K CTX
CodeQwen 1.5 7B
7B

CodeQwen 1.5 — Qwen Coder predecessor. Superseded by Qwen 2.5 Coder for new deployments.

COMMERCIAL OK·66K CTX
Qwen 2.5 Coder 14B Instruct
14B

Coding-specialized Qwen 2.5 at 14B. The 16GB-VRAM tier coding model — fits comfortably with 8K context.

COMMERCIAL OK·131K CTX
Qwen 2-VL 7B
7B

Qwen 2 vision-language predecessor to Qwen 2.5-VL. Apache 2.0 with strong document Q&A.

COMMERCIAL OK·MULTIMODAL·33K CTX
Qwen 2.5 Coder 3B
3B

Compact Qwen 2.5 Coder. Sweet spot for laptop autocomplete and small refactor agents.

COMMERCIAL OK·33K CTX
Qwen 2.5 Coder 1.5B
1.5B

Smallest Qwen 2.5 Coder. Targets edge / autocomplete on integrated GPUs and Apple Silicon laptops.

COMMERCIAL OK·33K CTX
Qwen 2.5-VL 3B
3B

Smallest Qwen 2.5-VL. Edge-deployable VLM with strong document Q&A.

COMMERCIAL OK·MULTIMODAL·33K CTX
Qwen 2.5-VL 7B
7B

Consumer-tier Qwen 2.5 VL. 7B + vision. Fits 8GB cards; the smallest practical multimodal Qwen.

COMMERCIAL OK·MULTIMODAL·33K CTX
Qwen 2.5 Math 72B
72B

Largest Qwen 2.5 Math. Datacenter-tier math specialist; eclipsed by R1 distills for general reasoning.

COMMERCIAL OK·4K CTX
Qwen 2.5 Coder 7B Instruct
7B

Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.

COMMERCIAL OK·131K CTX
Qwen 2.5 1.5B Instruct
1.5B

Compact Qwen 2.5. The 1.5B Apache-2.0 baseline.

COMMERCIAL OK·33K CTX
Qwen 2.5 3B Instruct
3B

Mid-edge Qwen 2.5. Note: 3B variant uses Qwen License (not Apache 2.0).

COMMERCIAL OK·33K CTX
Qwen 3 Embedding 8B
8B

Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.

COMMERCIAL OK·33K CTX
FAM · MISTRAL

Mistral

31 models
Mistral Medium 3.5 (675B MoE)
675B

Mistral's April 2026 frontier MoE. 675B total / 41B active. Strong European-multilingual lineage carries through; the new release competes head-to-head with…

RESTRICTED·262K CTX
Mistral Small 3 24B
24B

Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.

COMMERCIAL OK·33K CTX
Mistral Nemo 12B Instruct
12B

Joint Mistral/NVIDIA release with native 128K context and a new Tekken tokenizer. Strong multilingual; popular fine-tune base.

COMMERCIAL OK·131K CTX
Pixtral 12B
12B

Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.

COMMERCIAL OK·MULTIMODAL·131K CTX
Codestral 22B
22B

Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.

RESTRICTED·33K CTX
Mistral 7B Instruct v0.3
7B

The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.

COMMERCIAL OK·33K CTX
Mistral Large 2 (123B)
123B

Mistral's flagship dense model. Open weights but restricted commercial license — research and non-commercial only.

RESTRICTED·131K CTX
Kumru 2B
2.4B

Kumru 2B is a compact Turkish text-generation model from VNGRS. The Hugging Face config reports a Mistral-family architecture with an 8K context window, and…

COMMERCIAL OK·8K CTX
Mistral 7B Instruct v0.2
7B

Mistral 7B Instruct v0.2 is a 7-billion-parameter instruction-tuned model from Mistral AI with a 32,768-token context window. It uses `[INST]` prompt tags and…

COMMERCIAL OK·33K CTX
Mistral 7B Instruct v0.2
7B

Mistral AI's second instruct revision of their 7B model, bumping context from 8k to 32k tokens and updating the tokenizer to `mistral_common`. It's an…

COMMERCIAL OK·33K CTX
Mistral 7B Instruct v0.1
7B

Mistral 7B Instruct v0.1 is the instruction-tuned version of Mistral's first public 7B base model, fine-tuned on publicly available conversation datasets. It…

COMMERCIAL OK·4K CTX
Turkcell LLM 7B v1
7.4B

Turkcell LLM 7B v1 is an Apache-2.0 Turkish text-generation model built on a Mistral architecture. The measured Ollama artifact uses a RefinedNeuro GGUF…

COMMERCIAL OK·33K CTX
Turkish Mistral 7B Instruct v0.2
7B

Mistral 7B v0.2 continued-pretrained on Turkish data + instruction-tuned. The 32K context window makes it the best Turkish open-weight model for long-document…

COMMERCIAL OK·33K CTX
Bielik 11B v2.3 Instruct
11B

Bielik 11B v2.3 Instruct is SpeakLeash's Polish-language instruction-tuned model, built on the Bielik-11B-v2 base and released under Apache 2.0. It targets…

COMMERCIAL OK·4K CTX
Bielik 11B v2.3 Instruct
11B

An 11B Polish-language instruction model from SpeakLeash and ACK Cyfronet AGH, built as a linear merge of three instruct-tuned Bielik-11B-v2 variants. Uses…

COMMERCIAL OK·4K CTX
Mistral Turkish v2 (brooqs)
7.2B

Mistral Turkish v2 is a public Ollama-distributed Turkish Mistral variant. The upstream Hugging Face repository was not publicly accessible during intake, so…

RESTRICTED·8K CTX
Malhajar Mistral 7B Turkish
7.2B

Malhajar Mistral 7B Turkish is an Apache-2.0 Mistral 7B Instruct v0.2 Turkish fine-tune. The benchmarked Ollama tag is a koezgen quantized distribution of the…

COMMERCIAL OK·33K CTX
Sarvam M
24B

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode…

COMMERCIAL OK·4K CTX
Bielik 7B Instruct v0.1 GGUF
7B

Bielik 7B Instruct v0.1 is a Polish-language instruction-tuned model from speakleash, fine-tuned from Bielik-7B-v0.1 and distributed in GGUF format for…

RESTRICTED·4K CTX
Mistral 7B OpenOrca GGUF
7B

Mistral 7B fine-tuned on the OpenOrca instruction dataset, distributed by TheBloke in GGUF format for local CPU and GPU inference. Uses ChatML prompt…

COMMERCIAL OK·33K CTX
Japanese StableLM Instruct Gamma 7B
7B

A 7B instruction-tuned model from Stability AI built specifically for Japanese, using the Mistral architecture. Quantized to GGUF by TheBloke, so it runs on…

COMMERCIAL OK·33K CTX
Bielik 7B v0.1
7B

Bielik-7B v0.1 is a 7B-parameter base model built by continuously pretraining Mistral-7B on 70B+ tokens of Polish text, with data quality filtered via an…

COMMERCIAL OK·4K CTX
Bielik 11B v2.2 Instruct GGUF
11B

Bielik 11B v2.2 Instruct is a Polish-language instruction-tuned model from speakleash, available in GGUF format for local inference. It supports 32,768-token…

COMMERCIAL OK·33K CTX
Devstral Small 2 24B
24B

Mistral's coding-specialized Mistral Small 2 successor. Apache 2.0 — the rare commercial-OK Mistral coder.

COMMERCIAL OK·131K CTX
Mistral Small 3.2 24B
24B

Iterative refresh of Mistral Small 3 24B. Same architecture; improved instruction following and tool-call reliability. Apache 2.0.

COMMERCIAL OK·131K CTX
Mistral Saba 24B
24B

Mistral's Arabic and South Asian language specialist at 24B. Research license.

RESTRICTED·33K CTX
Ministral 3B Instruct
3B

Mistral edge model at 3B. Designed for on-device inference with extended 128k context. Research license only.

RESTRICTED·131K CTX
Ministral 8B Instruct
8B

Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.

RESTRICTED·131K CTX
Codestral Mamba 7B
7B

Mistral's Mamba (state-space) architecture coding model. Linear inference cost — the architectural alternative to attention-based coding models. Apache 2.0.

COMMERCIAL OK·256K CTX
Mistral Medium 3 24B (dense)
24B

Dense variant in the Mistral Medium 3.5 family. Research license — non-commercial open. Same training data as the MoE flagship but in a smaller dense package.

RESTRICTED·262K CTX
Magistral 32B
32B

Mistral's reasoning-specialized fine-tune of a Mistral Small base. Reasoning-token emission similar to Qwen 3 / DeepSeek R1 in a smaller footprint. Research…

RESTRICTED·131K CTX
FAM · DEEPSEEK

DeepSeek

20 models
DeepSeek V4 Pro (1.6T MoE)
1600B

DeepSeek's April 2026 frontier flagship. 1.6T total / 49B active MoE with hybrid Compressed Sparse Attention + Heavily Compressed Attention. 1M context window.…

COMMERCIAL OK·1.0M CTX
DeepSeek R1 (671B reasoning)
671B

Open reasoning model that closed the gap with frontier proprietary reasoners. Visible chain-of-thought, MIT license, and a family of distilled smaller variants.

COMMERCIAL OK·131K CTX
DeepSeek V4 Flash (284B MoE)
284B

The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it…

COMMERCIAL OK·1.0M CTX
DeepSeek R1 Distill Llama 70B
70B

Reasoning distillation onto Llama 3.3 70B. Best-in-class open-weight reasoner you can actually fit on a workstation.

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Qwen 32B
32B

32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.

COMMERCIAL OK·131K CTX
DeepSeek V3 (671B MoE)
671B

DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.

COMMERCIAL OK·66K CTX
DeepSeek R1 Distill Qwen 7B
7B

Smallest practical R1 distill. Reasoning on a 6GB GPU.

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Qwen 14B
14B

14B reasoning distill. Fits on 12GB cards.

COMMERCIAL OK·131K CTX
DeepSeek Coder V2 Lite (16B)
16B

MoE coding specialist — 16B total / 2.4B active. Fast on 12GB cards.

COMMERCIAL OK·131K CTX
DeepSeek V2 Lite Chat
15.7B

DeepSeek-V2-Lite-Chat is a 15.7B-total, 2.4B-active mixture-of-experts chat model using DeepSeek's Multi-head Latent Attention (MLA) architecture for KV-cache…

COMMERCIAL OK·33K CTX
DeepSeek Coder V2 236B
236B

Full DeepSeek Coder V2. 236B total / 21B active MoE coder.

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Llama 8B
8B

R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but…

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Qwen 1.5B
1.5B

Smallest R1 distill. Surprisingly capable reasoning at 1.5B for its size class; right pick when you need reasoning AND edge deployment.

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Qwen 3 32B
32B

Newer R1 distill on a Qwen 3 base. Combines R1 reasoning with Qwen 3's reasoning-toggle architecture. Apache 2.0.

COMMERCIAL OK·131K CTX
DeepSeek V2.5 236B
236B

DeepSeek V2.5 — merged V2 chat + Coder. Pre-V3 baseline; 21B active MoE.

COMMERCIAL OK·131K CTX
DeepSeek V4
745B

DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source…

COMMERCIAL OK·131K CTX
DeepSeek V3 Lite (16B MoE)
16B

Distillation of DeepSeek V3 to a smaller MoE. 16B total / 2.4B active. Captures most of V3's reasoning at consumer-card-friendly memory.

COMMERCIAL OK·131K CTX
DeepSeek R1 Distill Mistral 24B
24B

Community R1 distill onto a Mistral Small 3 base. Apache 2.0; combines R1 reasoning with Mistral instruction polish.

COMMERCIAL OK·33K CTX
DeepSeek MoE 16B Base
16B

DeepSeek's first MoE — 16B / 2.4B active. Older model retained for ecosystem-context value as the base of the V2/V3 lineage.

COMMERCIAL OK·4K CTX
DeepSeek Coder V3
33B

DeepSeek's coder line successor. Dense 33B; competitive with Qwen 2.5 Coder 32B on SWE-Bench.

COMMERCIAL OK·131K CTX
FAM · GEMMA

Gemma

20 models
Gemma 4 31B Dense
31B

Google's flagship dense Gemma 4. Beats some 400B-class proprietary models on benchmarks. Targets the 24GB single-GPU sweet spot.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 4 26B MoE
26B

MoE variant of Gemma 4. Faster per-token than the 31B dense at similar quality on most tasks.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 3 270M
0.27B

Gemma 3 270M is the smallest member of Google's Gemma 3 family, a 270-million-parameter text-only model designed for on-device deployment and task-specific…

COMMERCIAL OK·33K CTX
Gemma 3 27B
27B

Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 4 E4B (Effective 4B)
4B

Edge-class Gemma 4. The 'Effective 4B' branding signals it punches above its parameter count via training-data quality.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 3 12B
12B

12B Gemma 3. Fits on 12GB consumer cards. Multimodal.

COMMERCIAL OK·MULTIMODAL·131K CTX
Trendyol LLM Asure 12B
11.8B

Trendyol LLM Asure 12B is a Gemma 3 based multimodal instruct model for Turkish and English business workflows. The public Ollama build used in local testing…

COMMERCIAL OK·MULTIMODAL·131K CTX
Turkish Gemma 9B T1
9B

YTU's Turkish-tuned Gemma 2 9B model. The highest community-rated Turkish-language LLM on Hugging Face by likes-to-downloads ratio as of May 2026. Continued…

COMMERCIAL OK·8K CTX
Gemma 2 2B Instruct
2B

Gemma 2 2B Instruct is Google's instruction-tuned 2B model from the Gemma 2 generation, trained with knowledge distillation from larger Gemma models. It…

COMMERCIAL OK·8K CTX
Gemma 3 4B
4B

4B Gemma 3 for edge. Multimodal.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 2 9B Instruct
9B

Mid-size Gemma 2. Strong chat quality with a different training mix from Llama family.

COMMERCIAL OK·8K CTX
Gemma 4 E2B (Effective 2B)
2B

Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.

COMMERCIAL OK·MULTIMODAL·131K CTX
Gemma 4 Turkish 26B (4B active)
26B

Gemma 4 26B MoE (4B active params) pruned and Turkish-tuned. The largest Turkish-tuned open-weight model on HF as of May 2026. MoE architecture means it loads…

COMMERCIAL OK·131K CTX
YTU Turkish Gemma 9B v0.1
9.2B

YTU Turkish Gemma 9B v0.1 is a Gemma 2 based Turkish instruction model from the YTU CE COSMOS ecosystem. The benchmarked Ollama tag is an alibayram GGUF…

COMMERCIAL OK·8K CTX
Gemma 3 1B
1B

Smallest text-only Gemma 3 for phones and IoT.

COMMERCIAL OK·33K CTX
MedGemma 27B
27B

Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.

RESTRICTED·MULTIMODAL·131K CTX
CodeGemma 7B
7B

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

COMMERCIAL OK·8K CTX
ColPali v1.3
3B

3B-parameter visual document retriever built on PaliGemma-3B using a ColBERT-style late-interaction objective. Encodes a PDF page as a grid of patch…

COMMERCIAL OK
PaliGemma 2 10B
10B

Mid-tier PaliGemma 2 fine-tuning base. Better baseline for complex vision tasks.

COMMERCIAL OK·MULTIMODAL·8K CTX
PaliGemma 2 3B
3B

PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.

COMMERCIAL OK·MULTIMODAL·8K CTX
FAM · PHI

Phi

7 models
Phi-4 14B
14B

Microsoft's Phi-4 14B trained on synthetic textbook-quality data. Punches above weight on reasoning and math; MIT licensed.

COMMERCIAL OK·16K CTX
Phi-4 Reasoning 14B
14B

Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.

COMMERCIAL OK·33K CTX
Phi-3.5 Mini Instruct
3.8B

Compact 3.8B Phi for edge deployment. 128K context. Strong reasoning per parameter.

COMMERCIAL OK·131K CTX
Phi-3.5 Vision
4.2B

Multimodal Phi 3.5. Document and chart understanding at edge size. MIT licensed.

COMMERCIAL OK·MULTIMODAL·131K CTX
Phi-4 Multimodal
14B

Multimodal variant of Phi-4 14B. Vision + text. Smaller than Llama 4 Scout but covers most image-Q&A workflows; right-sized for 16GB consumer cards.

COMMERCIAL OK·MULTIMODAL·131K CTX
Phi-4 Mini 4B
3.8B

Microsoft's edge-tier Phi-4 variant. 3.8B params; designed for phone / tablet / Pi deployment. Strong reasoning per parameter — Phi family's traditional…

COMMERCIAL OK·131K CTX
Phi-4 Reasoning Mini 4B
3.8B

Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.

COMMERCIAL OK·131K CTX
FAM · EXAONE

EXAONE

6 models
EXAONE 3.5 2.4B Instruct
2.4B

EXAONE 3.5 2.4B Instruct is LG AI Research's bilingual English/Korean model built for low-resource devices. It handles up to 32K context tokens and shows…

RESTRICTED·33K CTX
K-EXAONE 236B A23B
236B

K-EXAONE is LG AI Research's 236B Mixture-of-Experts model with 23B active parameters per forward pass. It covers Korean, English, Spanish, German, Japanese,…

RESTRICTED·262K CTX
EXAONE 4.0.1 32B
32B

EXAONE 4.0.1 is a 32B model from LG AI Research with a 131K context window and a hybrid sliding-window/full-attention architecture. It runs in either standard…

RESTRICTED·131K CTX
EXAONE 3.5 8B
7.8B

Smaller EXAONE for consumer-tier Korean / CJK workloads.

RESTRICTED·33K CTX
EXAONE 3.5 32B
32B

LG AI Research's flagship Korean-ecosystem model. Strong on Korean/Japanese language tasks; competitive on English. License blocks commercial use without LG…

RESTRICTED·33K CTX
EXAONE 3.5 2.4B
2.4B

LG AI's edge-tier EXAONE. Strong Korean / English. Research-only license.

RESTRICTED·33K CTX
FAM · GRANITE

granite

6 models
Granite 3.1 2B Instruct
2B

Granite 3.1 2B Instruct is IBM's 2B-parameter dense instruct model with a 128K context window, post-trained for enterprise tasks including RAG, function…

COMMERCIAL OK·131K CTX
Granite 3.3 8B
8B

IBM Granite 3.3. Iterative refresh of 3.2 — same architecture; improved instruction following and tool-call reliability. Apache 2.0.

COMMERCIAL OK·131K CTX
Granite 3.0 2B Instruct
2B

IBM Granite at 2B. Apache 2.0 enterprise-friendly small model with safety tuning.

COMMERCIAL OK·4K CTX
Granite 3.0 8B Instruct
8B

Granite 3.0 8B — IBM's enterprise-tier baseline. Apache 2.0.

COMMERCIAL OK·4K CTX
Granite 3.2 8B
8B

IBM's enterprise-tuned 8B. Apache 2.0. Strong on enterprise-shaped tool-calling and structured output. Watson + RHEL ecosystem alignment.

COMMERCIAL OK·131K CTX
Granite 3 MoE (3B active)
16B

Granite MoE shape. 16B total / 3B active. Workstation-deployable; the IBM enterprise alternative to Qwen / DeepSeek small MoEs.

COMMERCIAL OK·131K CTX
FAM · COMMAND-R

Command R

5 models
Command R+ 104B
104B

Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.

RESTRICTED·131K CTX
Command R 35B
35B

Cohere's mid-tier — RAG and tool use. Non-commercial license.

RESTRICTED·131K CTX
Command R7B (12-2024)
8B

Command R7B (December 2024) is Cohere's smallest model in the Command R family, an 8B-parameter dense transformer with 128K context, trained for…

RESTRICTED·131K CTX
Command R+ (Aug 2024)
104B

Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.

RESTRICTED·131K CTX
Aya Expanse 32B
32B

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya…

RESTRICTED·8K CTX
FAM · FALCON

falcon

5 models
Falcon 40B Instruct
40B

Falcon-40B-Instruct is a 40B parameter instruction-tuned model from TII (UAE), fine-tuned on Baize chat data for conversation and instruction-following. It…

COMMERCIAL OK·2K CTX
Falcon 3 3B Instruct
3B

Falcon 3 3B Instruct is TII's 3-billion-parameter instruct model from the Falcon 3 family, supporting English, French, Spanish, and Portuguese with a 32K…

COMMERCIAL OK·33K CTX
Falcon 3 7B Instruct
7B

Falcon 3 mid-size from TII. Permissive Falcon license; multilingual focus.

COMMERCIAL OK·33K CTX
Falcon 3 10B
10B

TII's Falcon 3 at the 10B tier. Strong on Arabic-language tasks; competitive on English.

COMMERCIAL OK·33K CTX
Falcon Mamba 7B
7B

TII's Mamba (state-space) architecture model. Linear inference cost; the architectural alternative to attention-based models.

COMMERCIAL OK·256K CTX
FAM · HERMES

hermes

4 models
Hermes 3 Llama 3.1 8B
8B

NousResearch's Hermes fine-tune of Llama 3.1 8B. Stronger system-prompt adherence, JSON output, role-play, and agent steering than the base Llama.

COMMERCIAL OK·131K CTX
Hermes 3 Llama 3.1 70B
70B

Hermes 3 at 70B. Workstation-tier agent-tuned model.

COMMERCIAL OK·131K CTX
Hermes 3 Llama 3.2 3B
3B

Nous Research's Hermes 3 fine-tune of Llama 3.2 3B. Strong general-instruction following at the 3B tier.

COMMERCIAL OK·131K CTX
Hermes 4 Llama 3.3 70B
70B

Nous Research's Hermes 4 fine-tune of Llama 3.3 70B. Strong on instruction following and creative tasks; community-favored alternative to base Llama.

COMMERCIAL OK·131K CTX
FAM · DOLPHIN

dolphin

3 models
Dolphin 3.0 Mistral 24B
24B

Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.

COMMERCIAL OK·33K CTX
Dolphin 3.0 Llama 3.2 3B
3B

Eric Hartford's Dolphin fine-tune at 3B. Less-censored than the base Llama; popular for unconstrained-generation use cases.

COMMERCIAL OK·131K CTX
Dolphin 3 Llama 3.3 70B
70B

Eric Hartford's Dolphin 3 at 70B Llama 3.3 base. Less-restricted alternative for creative / unconstrained workflows.

COMMERCIAL OK·131K CTX
FAM · MIXTRAL

mixtral

3 models
Mixtral 8x7B Instruct
47B

The MoE model that introduced the 8-experts pattern to the open-weight world. 47B params total, 13B active. Still a viable workhorse on 36GB+ setups.

COMMERCIAL OK·33K CTX
Mixtral 8x22B Instruct
141B

The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.

COMMERCIAL OK·66K CTX
Mixtral 8X7B Instruct v0.1 GPTQ
46.7B

GPTQ 4-bit quantized build of Mistral AI's Mixtral 8x7B Instruct, a sparse mixture-of-experts model with 46.7B total parameters. Natively handles German,…

COMMERCIAL OK·8K CTX
FAM · GLM

GLM

3 models
GLM-4V 9B
13.9B

GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.

RESTRICTED·MULTIMODAL·8K CTX
GLM-4 9B
9B

Zhipu's GLM-4 at 9B. Strong on Chinese-language tasks; tool-calling format slightly different from OpenAI convention.

RESTRICTED·131K CTX
GLM-5 Pro
144B

Zhipu's GLM-5 flagship. 144B total / 16B active MoE. Strong on Chinese-language tasks; competitive on English at the workstation-cluster tier.

RESTRICTED·131K CTX
FAM · MINICPM

MiniCPM

3 models
MiniCPM-V 3 8B
8B

MiniCPM-V successor. Multimodal at 8B with stronger document Q&A than 2.6.

COMMERCIAL OK·MULTIMODAL·33K CTX
MiniCPM 3 4B
4B

OpenBMB's edge-optimized 4B. MIT license; designed for phone deployment. Strong reasoning per parameter.

COMMERCIAL OK·33K CTX
MiniCPM-V 2.6 8B
8B

Multimodal MiniCPM at 8B. Vision + text; strong on document Q&A for the size class.

COMMERCIAL OK·MULTIMODAL·33K CTX
FAM · YI

Yi

2 models
Yi 1.5 34B
34B

01.AI's 34B model. Solid bilingual EN/ZH performance, Apache 2.0.

COMMERCIAL OK·16K CTX
Yi Coder 9B
9B

01.AI's coding specialization at 9B. Apache 2.0; positioned as a lighter alternative to Qwen 2.5 Coder for the 16GB tier.

COMMERCIAL OK·131K CTX
FAM · STEPFUN

StepFun

2 models
GOT-OCR 2.0
0.58B

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX…

COMMERCIAL OK
Step-3
1000B

StepFun's 1T-parameter MoE. 38B active. One of the largest open-weight models; cluster-only at any quant. Restricted license.

RESTRICTED·66K CTX
FAM · OLMO

OLMo

2 models
OLMo 2 1B Instruct
1B

OLMo 2 1B Instruct is AllenAI's 1-billion-parameter instruct model from the April 2025 OLMo 2 release, post-trained with RLVR on math. It is fully open:…

COMMERCIAL OK·4K CTX
OLMo 2 13B
13B

AI2's fully-open 13B. Apache 2.0; full training data + checkpoints + recipes published. The reproducibility-first model in the 13B class.

COMMERCIAL OK·4K CTX
FAM · INTERNLM

InternLM

2 models
InternLM 2.5 7B Chat
7B

InternLM 2.5 mid-size chat. Apache 2.0; strong on math and Chinese.

COMMERCIAL OK·1.0M CTX
InternLM 3 8B
8B

Shanghai AI Lab's open-research line. InternLM 3 at 8B; strong on Chinese-language tasks.

RESTRICTED·33K CTX
FAM · DBRX

DBRX

2 models
DBRX Instruct
132B

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

COMMERCIAL OK·33K CTX
DBRX Base
132B

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

COMMERCIAL OK·33K CTX
FAM · WIZARD

wizard

1 model
WizardLM-2 8x22B
141B

Microsoft's RLHF-heavy fine-tune of Mixtral 8x22B. Briefly the top open chat model on LMSYS at release.

COMMERCIAL OK·66K CTX
FAM · BAICHUAN

baichuan

1 model
Baichuan 4 13B
13B

Baichuan AI's 13B. Chinese-language ecosystem alternative to Qwen / GLM. Restricted commercial license.

RESTRICTED·131K CTX
FAM · JANUS

janus

1 model
Janus-Pro 7B
7B

DeepSeek's multimodal 7B. Decoupled visual encoding for understanding vs generation — different from typical VLM design.

COMMERCIAL OK·MULTIMODAL·4K CTX
FAM · RWKV

RWKV

1 model
RWKV 7 'Goose' 1.5B
1.5B

RWKV 7 'Goose' at 1.5B. Linear-time inference architecture (constant memory regardless of context). Apache 2.0.

COMMERCIAL OK·1.0M CTX
FAM · OPENCODER

OpenCoder

1 model
OpenCoder 8B
8B

Fully-open coding model — training data + recipes published. Apache 2.0 with verifiable open-data lineage. The right pick for academic /…

COMMERCIAL OK·33K CTX
FAM · HUNYUAN

hunyuan

1 model
Hunyuan Large 389B MoE
389B

Tencent's frontier MoE. 389B total / 52B active. License permits commercial use with restrictions on companies above MAU thresholds.

COMMERCIAL OK·256K CTX
FAM · MOONSHOT

moonshot

1 model
Kimi K1.5
200B

Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted…

RESTRICTED·200K CTX
FAM · OPENBIOLLM

openbiollm

1 model
OpenBioLLM Llama 3 70B
70B

Medical / biomedical fine-tune of Llama 3 70B. Strong on USMLE and clinical-knowledge benchmarks; right pick when domain-specific medical depth matters more…

COMMERCIAL OK·8K CTX
FAM · OTHER

Other

100 models
all-MiniLM-L6-v2
0.022B

all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM…

COMMERCIAL OK·256 CTX
FLUX.1 [dev]
12B

12B-parameter rectified-flow transformer for text-to-image, guidance-distilled from the FLUX.1 [pro] teacher. Currently the most-liked model on Hugging Face…

RESTRICTED
Nomic Embed Text v1.5
0.137B

Nomic Embed Text v1.5 is a 137M-parameter English embedding model with an 8192-token context window, trained with Matryoshka Representation Learning so the…

COMMERCIAL OK·8K CTX
Kokoro 82M
0.082B

82M-parameter StyleTTS2-derived TTS that went viral in early 2025 for matching billion-parameter TTS quality at ~1% the size. Apache-2.0 weights, dozens of…

COMMERCIAL OK
BGE Large EN v1.5
0.335B

BGE Large EN v1.5 is the 335M-parameter English flagship from BAAI's FlagEmbedding family, producing 1024-dim embeddings with a 512-token context window.…

COMMERCIAL OK·512 CTX
BGE Reranker v2 M3
0.57B

BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.

COMMERCIAL OK·8K CTX
all-mpnet-base-v2
0.109B

all-mpnet-base-v2 is a 109M-parameter sentence-transformers embedder based on Microsoft's MPNet, producing 768-dim vectors with a 384-token context. Trained on…

COMMERCIAL OK·384 CTX
XTTS v2
0.46B

Coqui's flagship multilingual voice-cloning TTS — clones a speaker from a 6-second reference clip and synthesizes in 17 languages with cross-lingual transfer.…

RESTRICTED
Whisper Base
0.074B

74M-parameter Whisper variant — roughly 2x the params of tiny for ~25-30% relative WER reduction. The standard pick for CPU realtime transcription with…

COMMERCIAL OK·30 CTX
Whisper Small
0.244B

244M-parameter Whisper. The smallest Whisper checkpoint considered 'production grade' for non-English audio. Sweet spot for laptops with iGPU/Metal or modest…

COMMERCIAL OK·30 CTX
Whisper Tiny
0.039B

Smallest member of the Whisper encoder-decoder ASR family (39M params). Trained on 680k hours of weakly supervised multilingual audio. Targets sub-realtime…

COMMERCIAL OK·30 CTX
paraphrase-multilingual-MiniLM-L12-v2
0.118B

paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing…

COMMERCIAL OK·128 CTX
GLM-5
200B

Zhipu's GLM-5 currently leads the Open LLM Leaderboard 2026. Strong reasoning and bilingual EN/ZH capability.

COMMERCIAL OK·200K CTX
mxbai-embed-large-v1
0.335B

mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation…

COMMERCIAL OK·512 CTX
Jina Embeddings v3
0.572B

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage,…

RESTRICTED·8K CTX
Multilingual E5 Large Instruct
0.56B

Multilingual E5 Large Instruct is a 560M-parameter XLM-RoBERTa-large encoder fine-tuned by Microsoft's intfloat team with task instructions appended to…

COMMERCIAL OK·514 CTX
FLUX.1 [schnell]
12B

12B rectified-flow transformer, timestep-distilled to 1-4 sampling steps, released under Apache-2.0. Same architecture as FLUX.1 [dev] but trades a bit of…

COMMERCIAL OK
Nemotron 3 Nano (30B-A3B)
30B

NVIDIA's hybrid Mamba-2 + Transformer MoE for on-device agents. 30B total / 3B active. 1M-token context window with reasoning ON/OFF modes and 4× faster…

COMMERCIAL OK·1M CTX
Kimi K2.6
1000B

Moonshot's long-context, agent-oriented MoE. Optimized for stability under tool use and multi-step coding/planning workflows.

COMMERCIAL OK·2M CTX
SigLIP SO400M (patch14-384)
0.428B

428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1…

COMMERCIAL OK
Nemotron 3 Super (120B-A12B)
120B

Workstation-tier Nemotron 3. 120B total / 12B active. 5× higher throughput than the prior Super, 1M context, designed for multi-agent applications.

COMMERCIAL OK·1M CTX
Distil-Whisper Large v3
0.756B

756M-param distilled Whisper-large-v3 with the decoder shrunk from 32 to 2 layers. ~6.3x faster than the teacher at near-parity WER on long-form English (1%…

COMMERCIAL OK·30 CTX
Snowflake Arctic Embed L v2.0
0.568B

Arctic Embed L v2.0 is a 568M-parameter multilingual embedder from Snowflake based on XLM-RoBERTa, producing 1024-dim Matryoshka vectors with an 8192-token…

COMMERCIAL OK·8K CTX
SDXL Turbo
2.6B

2.6B SDXL backbone trained with Adversarial Diffusion Distillation (ADD), producing photorealistic 512px images in a single forward pass. Designed for…

RESTRICTED
Jina Reranker v2 Base Multilingual
0.278B

Jina Reranker v2 Base Multilingual is a 278M-parameter cross-encoder from Jina AI with a 1024-token context, trained on 100+ languages plus code and structured…

RESTRICTED·1K CTX
SmolLM2 135M Instruct
0.135B

SmolLM2-135M-Instruct is the smallest instruction-tuned model in Hugging Face's SmolLM2 family, a 135M-parameter Llama-architecture model trained for on-device…

COMMERCIAL OK·8K CTX
OLMo 2 32B
32B

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

COMMERCIAL OK·33K CTX
Florence-2 Large
0.77B

770M-parameter unified vision foundation model with a DaViT image encoder and BART-style seq2seq decoder. One model, one set of weights — handles captioning,…

COMMERCIAL OK
GTE ModernBERT Base
0.149B

GTE ModernBERT Base is a 149M-parameter English embedder built on AnswerDotAI's ModernBERT backbone, producing 768-dim vectors with native 8192-token context…

COMMERCIAL OK·8K CTX
E5 Mistral 7B Instruct
7.11B

E5-Mistral-7B-Instruct is a 7.11B-parameter decoder-based embedder fine-tuned from Mistral-7B by Microsoft's intfloat team, producing 4096-dim embeddings with…

COMMERCIAL OK·33K CTX
Omni 31B Turkish Reasoning
31B

31B-parameter Turkish-tuned reasoning model with i1-imatrix quantizations by mradermacher. Designed for step-by-step problem solving in Turkish. Highest…

RESTRICTED·33K CTX
Stable Diffusion 3.5 Medium
2.5B

2.5B MMDiT-X with improved Querying Key Normalization and dual attention blocks at lower resolutions. Trained for 0.25-2MP output. Positioned as the mid-tier…

COMMERCIAL OK
EXAONE Deep 7.8B
7.8B

EXAONE Deep 7.8B is LG AI Research's reasoning-focused model, fine-tuned from EXAONE-3.5-7.8B-Instruct for math and coding tasks. It claims benchmark wins over…

RESTRICTED·33K CTX
TinyLlama 1.1B Chat v0.3 AWQ
1.1B

TinyLlama 1.1B Chat v0.3 is a 1.1B-parameter chat model quantized to 4-bit AWQ by TheBloke. It uses the ChatML prompt format and fits comfortably in very low…

COMMERCIAL OK·2K CTX
TinyLlama 1.1B Chat v0.3 GPTQ
1.1B

GPTQ-quantized build of TinyLlama 1.1B Chat v0.3, trained on SlimPajama, StarCoder, and OpenAssistant data. Runs in roughly 0.8 GB VRAM thanks to 4-bit…

COMMERCIAL OK·2K CTX
Piper
0.025B

VITS-based neural TTS optimized for Raspberry Pi-class hardware. Ships as ONNX checkpoints with ~100 voices across 30+ languages. Powers Home Assistant's local…

COMMERCIAL OK
Ring-2.6-1T
1000B

InclusionAI's Ring-2.6-1T is a 1 trillion parameter Mixture-of-Experts model with ~32B activated parameters per token, released by Ant Group's AI research arm.…

COMMERCIAL OK·128K CTX
mxbai-rerank-large-v2
1.54B

mxbai-rerank-large-v2 is a 1.54B-parameter listwise reranker from Mixedbread AI built on Qwen2.5-1.5B, supporting 100+ languages and a 32K-token context with…

COMMERCIAL OK·33K CTX
GPT-NeoX 20B
20B

GPT-NeoX-20B is a 20B-parameter English autoregressive model from EleutherAI, trained on the 825 GiB Pile dataset. It uses a GPT-3-style transformer…

COMMERCIAL OK·2K CTX
NVIDIA Nemotron Nano 9B v2 Japanese
9B

A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both…

COMMERCIAL OK·131K CTX
EXAONE 3.5 7.8B Instruct
7.8B

EXAONE 3.5 7.8B is LG AI Research's instruction-tuned bilingual model for English and Korean, with a 32K token context window. It succeeds EXAONE 3.0 with…

RESTRICTED·33K CTX
Mihenk LLM v2 35B (Turkish Financial)
35B

35B MoE (3B active) tuned specifically for Turkish financial-services text — bank statements, investment research, accounting terminology. Niche-cluster model…

RESTRICTED·33K CTX
Parakeet TDT 0.6B v2
0.6B

600M-parameter FastConformer-TDT transducer ASR from NVIDIA NeMo. Topped the Hugging Face Open ASR Leaderboard in 2025 for English, with WER ~6.05% averaged…

COMMERCIAL OK
Kanarya 2B
2B

Turkish-from-scratch language model trained by Ali Safaya (Koç University researcher). Named after the kanarya (Turkish for 'canary'). Trained on 250+ GB of…

COMMERCIAL OK·2K CTX
SmolLM2 360M Instruct
0.36B

SmolLM2-360M-Instruct is the middle tier of the SmolLM2 instruct family, a 360M-parameter Llama-architecture model with an 8K context. It is shipped with ONNX…

COMMERCIAL OK·8K CTX
F5-TTS
0.336B

Flow-matching non-autoregressive TTS built on a Diffusion Transformer (DiT) backbone with ConvNeXt text refinement. Trained on the 100K-hour Emilia dataset;…

RESTRICTED
VBART Large (Turkish Summarization)
0.4B

Turkish BART-style sequence-to-sequence model fine-tuned specifically for summarization. Not a chat model — purpose-built for input-document → Turkish-summary…

COMMERCIAL OK·1K CTX
Kanarya 750M
0.75B

Smaller Kanarya variant — 750M parameters. Runs on CPU or 4GB GPU comfortably. Useful for low-resource Turkish text classification, embeddings, or completion…

COMMERCIAL OK·2K CTX
Turkish GPT-2 Large
0.7B

GPT-2 Large architecture trained from scratch on Turkish. Reference baseline for measuring how much modern instruction-tuned models actually improve on the…

COMMERCIAL OK·1K CTX
SmolVLM Instruct
2.25B

SmolVLM-Instruct is Hugging Face's compact vision-language model built on the Idefics3 architecture, pairing SmolLM2-1.7B-Instruct with a SigLIP-SO400M vision…

COMMERCIAL OK·8K CTX
Sarvam 30B
30B

Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size…

COMMERCIAL OK·4K CTX
Orpheus 3B 0.1 FT
3B

LLaMA-architecture 3B model fine-tuned as a TTS that emits SNAC audio tokens. Designed for highly expressive, emotion-controllable speech with laughter, sighs,…

COMMERCIAL OK
EXAONE 3.5 32B Instruct
32B

EXAONE 3.5 32B Instruct is LG AI Research's 32B bilingual model, trained for instruction-following in English and Korean. It supports a 32,768-token context…

RESTRICTED·33K CTX
Merlyn Education Safety 12B AWQ
12B

A 12B GPT-NeoX model from Merlyn Mind, fine-tuned specifically to refuse or soften unsafe content in K-12 and higher-education contexts. Delivered in AWQ 4-bit…

COMMERCIAL OK·2K CTX
EXAONE 3.5 32B Instruct AWQ
32B

EXAONE 3.5 32B Instruct is LG AI Research's bilingual English/Korean instruction model, quantized to 4-bit AWQ for lower VRAM overhead. It supports a 32K…

RESTRICTED·33K CTX
Sarvam 105B
105B

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks…

COMMERCIAL OK·128K CTX
GPT-OSS Swallow 20B RL v0.1
20B

A 20B bilingual model from TokyoTech built on GPT-OSS via continual pre-training, SFT, and reinforcement learning with verifiable rewards (RLVR). Targets…

COMMERCIAL OK·33K CTX
llm-jp 4 32B A3B Thinking
32B

A 32B MoE model from Japan's National Institute of Informatics, with only 3B parameters active per forward pass. Trained on 11.7T tokens across four stages:…

COMMERCIAL OK·66K CTX
gpt2-base-french
0.124B

A 124M-parameter GPT-2 base model trained on French Wikipedia (wiki40b/fr) and a CC-100/fr subset, with a 50,000-token BPE vocabulary. It generates French text…

RESTRICTED·1K CTX
GPT-2 Spanish
0.124B

GPT-2 Spanish is a 124M-parameter model trained from scratch on 11.5GB of Spanish text (Wikipedia, books) with a custom Spanish BPE tokenizer. It generates…

COMMERCIAL OK·1K CTX
mGPT 13B
13B

mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and…

COMMERCIAL OK·2K CTX
PhoGPT 4B Chat
3.7B

PhoGPT-4B-Chat is VinAI's 3.7B-parameter Vietnamese chat model, fine-tuned from a base trained on 102B Vietnamese tokens. It handles up to 8192-token contexts…

COMMERCIAL OK·8K CTX
Pollux Judge 32B
32B

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a…

COMMERCIAL OK·4K CTX
GPT-2 Spanish Medium
0.355B

A 355M-parameter GPT-2 Medium trained from scratch on 11.5 GB of Spanish text (Wikipedia and books), with a BPE tokenizer built specifically for Spanish.…

COMMERCIAL OK·1K CTX
OpenELM 3B Instruct
3B

OpenELM-3B-Instruct is Apple's 3-billion-parameter instruct model using a layer-wise scaled transformer with varying FFN multipliers and KV-head counts across…

RESTRICTED·2K CTX
mGPT 1.3B Uzbek
1.3B

A 1.3B-parameter GPT-2-style model fine-tuned on Uzbek text for 50,000 steps on a single A100. Covers Uzbek, Russian, and English generation. It is a base…

COMMERCIAL OK·2K CTX
PhoGPT 4B
3.7B

PhoGPT-4B is a 3.7B-parameter model pre-trained from scratch on 102B Vietnamese tokens, making it one of the few Vietnamese-first generative models available.…

COMMERCIAL OK·8K CTX
Dostoevsky Doesn't Write It GPT2
0.175B

A 175M-parameter GPT-2 model fine-tuned on Dostoevsky's digitized works, built on top of ruGPT3-small. Trained for five epochs, it generates Russian prose in a…

COMMERCIAL OK·1K CTX
mGPT 1.3B Mongol
1.3B

A 1.3B-parameter GPT model fine-tuned from ai-forever's mGPT base for Mongolian, with English and Russian also supported. Fine-tuning ran for 50,000 steps on…

COMMERCIAL OK·2K CTX
Vikhr Qwen 2.5 0.5B Instruct
0.5B

A 0.5B Russian-language instruct model fine-tuned from Qwen2.5-0.5B on the GrandMaster-PRO-MAX dataset (~150k instructions). Vikhrmodels claims 4x efficiency…

COMMERCIAL OK·4K CTX
OpenThaiGPT 1.5 7B Instruct
7B

OpenThaiGPT 1.5 7B is a Thai-language chat model fine-tuned from Qwen2.5 on over 2 million Thai instruction pairs. It targets Thai academic benchmarks and…

RESTRICTED·131K CTX
Typhoon S ThaiLLM 8B Instruct Research Preview
8B

An instruction-tuned 8B Thai language model from typhoon-ai, built on ThaiLLM using supervised fine-tuning and on-policy distillation. Training ran on a single…

COMMERCIAL OK·33K CTX
Sarvam 105B FP8
105B

Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages…

COMMERCIAL OK·131K CTX
SmolLM 2 1.7B Instruct
1.7B

SmolLM 2 flagship. Open data + open weights at the edge tier.

COMMERCIAL OK·8K CTX
Nemotron Mini 4B Instruct
4B

NVIDIA's edge-tier Nemotron. Distilled from Minitron lineage with role-play tuning.

COMMERCIAL OK·4K CTX
StarCoder 2 7B
7B

Mid-size StarCoder 2. The 8GB-VRAM autocomplete pick.

COMMERCIAL OK·16K CTX
Whisper Large v3
1.55B

OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.

COMMERCIAL OK·MULTIMODAL
Molmo 72B
72B

Molmo flagship. Apache 2.0 VLM rivaling proprietary models on UI pointing and visual reasoning.

COMMERCIAL OK·MULTIMODAL·4K CTX
LLaVA 1.6 Mistral 7B
7B

LLaVA 1.6 on Mistral 7B base. Apache 2.0 vision-language with strong OCR.

COMMERCIAL OK·MULTIMODAL·33K CTX
StarCoder 2 15B
15B

StarCoder 2 flagship. The largest BigCode coder; 16k context with strong fill-in-middle.

COMMERCIAL OK·16K CTX
StarCoder 2 3B
3B

BigCode's StarCoder 2 at 3B. Trained on The Stack v2 with 600+ programming languages.

COMMERCIAL OK·16K CTX
Aya 23 8B
8B

Cohere's multilingual research model covering 23 languages. CC-BY-NC — research only.

RESTRICTED·8K CTX
Jamba 1.5 Mini
52B

AI21's hybrid Mamba-Transformer MoE. 256k context with the SSM throughput advantage.

COMMERCIAL OK·262K CTX
Tulu 3 70B
70B

Tulu 3 at 70B. AI2's fully-open instruct fine-tune — research transparency at scale.

COMMERCIAL OK·131K CTX
SmolLM 2 360M Instruct
0.36B

Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.

COMMERCIAL OK·8K CTX
BGE M3
0.57B

BAAI's multilingual embedding flagship. Dense + sparse + ColBERT-style multi-vector. The de-facto open multilingual embedding pick.

COMMERCIAL OK·8K CTX
NV-Embed v2
7.85B

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

RESTRICTED·33K CTX
LLaVA-OneVision 7B
7B

LLaVA-OneVision unified single-image / multi-image / video VLM on Qwen 2 base.

COMMERCIAL OK·MULTIMODAL·33K CTX
Whisper Large v3 Turbo
0.81B

Distilled Whisper Large v3. ~8x faster decode at near-equivalent accuracy on most languages.

COMMERCIAL OK·MULTIMODAL
SmolLM 3 3B
3B

HuggingFace's small-model line at 3B. Apache 2.0. Designed for edge / educational deployments.

COMMERCIAL OK·33K CTX
Moondream 2
1.9B

Tiny vision-language model. ~1.9B; designed for edge / embedded multimodal use cases. Apache 2.0.

COMMERCIAL OK·MULTIMODAL·2K CTX
Jamba 1.5 Large
398B

Jamba flagship at 398B total / 94B active. Frontier hybrid-architecture model with 256k context.

COMMERCIAL OK·262K CTX
Nemotron 3 Nano 9B
9B

NVIDIA's Nemotron 3 at 9B. Tuned for NVIDIA-stack deployment patterns; strong tool-calling reliability.

COMMERCIAL OK·131K CTX
Molmo 7B-D
8B

AI2's fully-open VLM. Trained on PixMo dataset; pointing capability for UI grounding.

COMMERCIAL OK·MULTIMODAL·4K CTX
Tulu 3 8B
8B

AI2's fully-open post-training recipe applied to Llama 3.1 8B. Open data, open code, open weights.

COMMERCIAL OK·131K CTX
Stable LM 2 12B
12B

Stability AI's 12B. Stable LM line; commercial use requires paid membership. Solid baseline at 12B class.

RESTRICTED·4K CTX
InternVL 2.5 78B
78B

InternVL 2.5 flagship. Approaches frontier proprietary VLMs on document and OCR tasks.

COMMERCIAL OK·MULTIMODAL·33K CTX
Aya 23 35B
35B

Aya 23 at 35B. Built on Cohere's Command-R lineage. Non-commercial.

RESTRICTED·8K CTX
Nemotron 3 Super 49B
49B

Nemotron 3 mid-tier. 49B dense; fits 32GB cards with AWQ. NVIDIA stack alignment carries through.

COMMERCIAL OK·131K CTX
InternVL 2.5 26B
26B

InternVL 2.5 mid-tier — Shanghai AI Lab vision-language model with strong document and chart understanding.

COMMERCIAL OK·MULTIMODAL·33K CTX