BLK · MODEL REGISTRY

Open-weight models

315 models tracked. Hardware requirements, license terms, and quantization sizes for each.

ALSO →Browse prompting kits (system prompts, chat templates, tool-call formats)

Check if a model runs on your GPU·Find the right GPU for the models you want

FAM · LLAMA

Llama

42 models

Meta's small flagship. Strong general reasoning, 128K context, broad multilingual. The default first try for most local-AI use cases on consumer hardware.

COMMERCIAL OK·131K CTX

Llama 4 Scout

109B

Meta's 2026 flagship MoE model. 109B total parameters with only 17B active per forward pass and a record 10-million-token context window — unmatched in…

COMMERCIAL OK·10M CTX

Llama 3.3 70B Instruct

70B

Late-2024 refresh of the 70B Llama line. Roughly matches Llama 3.1 405B on most benchmarks at one-fifth the parameter count. The default high-end model for…

COMMERCIAL OK·131K CTX

Llama 3.2 3B Instruct

Lightweight 3B for edge and laptop deployment. Runs comfortably on 8GB VRAM at 30+ tok/s on Apple Silicon.

COMMERCIAL OK·131K CTX

Llama 3.1 70B Instruct

70B

The 70B sibling of Llama 3.1 8B. Strong generalist reasoning with 128K context, popular base for agentic fine-tunes (Hermes 3, Nemotron). Mostly superseded by…

COMMERCIAL OK·131K CTX

Llama 3.1 Nemotron 70B Instruct

70B

NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.

COMMERCIAL OK·131K CTX

Llama 3.2 11B Vision Instruct

11B

First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.

COMMERCIAL OK·MULTIMODAL·131K CTX

TinyLlama 1.1B Chat v1.0

1.1B

TinyLlama-1.1B-Chat-v1.0 is a 1.1B Llama-2-architecture model pretrained on 3 trillion tokens and chat-tuned on UltraChat and UltraFeedback. It was one of the…

COMMERCIAL OK·2K CTX

Llama 4 Maverick

400B

Meta's high-end Llama 4 sibling — 128 experts MoE built for performance over efficiency. Multilingual strength is its standout. Effectively a server-tier…

COMMERCIAL OK·MULTIMODAL·1M CTX

Llama 3.1 Nemotron Ultra 253B

253B

NVIDIA's top open reasoning model in the Llama 3.1 lineage. Server-tier; trained for groundbreaking reasoning accuracy on agentic workloads.

COMMERCIAL OK·131K CTX

Llama 3.1 Nemotron Nano 8B

Smallest of the Nemotron reasoning trio. NAS-optimized for inference efficiency on RTX hardware.

COMMERCIAL OK·131K CTX

Llama 3.2 1B Instruct

True edge-tier Llama. Runs on a phone or Raspberry Pi. Useful for classification, simple summarization, and on-device agents.

COMMERCIAL OK·131K CTX

Trendyol LLM 7B Chat v0.1

Turkish-tuned chat model released by Trendyol, Turkey's largest e-commerce platform. Built on Llama 2 7B, fine-tuned on Turkish customer-service style…

RESTRICTED·4K CTX

Turkish Llama 8B Instruct v0.1

Llama 3 8B continued pre-trained on Turkish corpora, then instruction-tuned for Turkish chat. YTU CE COSMOS group's most-downloaded Llama variant. GGUF builds…

COMMERCIAL OK·8K CTX

Llama 3.2 90B Vision Instruct

90B

The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.

COMMERCIAL OK·MULTIMODAL·131K CTX

Cosmos Llama 3 8B Turkish

YTU CE COSMOS's Llama 3 8B Turkish instruction-tuned variant. Follow-up to the original Turkish-Llama-8b that uses the Llama 3 base instead of Llama 2 — better…

COMMERCIAL OK·8K CTX

Salamandra 7B Instruct

Salamandra 7B Instruct is an Apache 2.0 instruction-tuned model from Barcelona Supercomputing Center, pretrained from scratch on 12.875 trillion tokens across…

COMMERCIAL OK·8K CTX

Trendyol LLM 7B Base v0.1

Base (non-chat) variant of Trendyol's 7B Turkish LLM. The chat sibling is the more popular pick; this base version is for operators building their own…

RESTRICTED·4K CTX

LLM-jp 4 8B Thinking

LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning.…

COMMERCIAL OK·66K CTX

SOLAR 10.7B v1.0

10.7B

SOLAR 10.7B is a base pretrained model from Upstage built by applying depth up-scaling (DUS) to Mistral 7B, pushing parameters to 10.7B without a traditional…

COMMERCIAL OK·4K CTX

ALIA 40b instruct 2601

40B

BSC-LT's 40B instruction-tuned model with first-class support for Spanish, Catalan, Basque, and Galician alongside English. Pretrained on 9.83 trillion tokens…

COMMERCIAL OK·164K CTX

LLM-jp 4 8B Instruct

An 8B bilingual model from Japan's National Institute of Informatics, instruction-tuned via SFT on a Japanese/English corpus of 11.7T tokens. Supports up to…

COMMERCIAL OK·66K CTX

Hermes 4 70B FP8

70B

Hermes 4 is a 70B reasoning model from NousResearch, built on Llama-3.1-70B with FP8 quantization to cut memory overhead. It supports explicit `<think>`…

COMMERCIAL OK·128K CTX

RefinedNeuro RN TR R2

RefinedNeuro RN TR R2 is an Apache-2.0 Llama-family 8B model distributed on Hugging Face and Ollama. It is measured alongside R1 to compare same-size…

COMMERCIAL OK·8K CTX

RefinedNeuro RN TR R1

RefinedNeuro RN TR R1 is an Apache-2.0 Llama-family 8B reasoning model distributed on Hugging Face and Ollama. It is included in the local sweep as a compact…

COMMERCIAL OK·8K CTX

Swallow 7B

Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the…

RESTRICTED·4K CTX

Salamandra 2B Instruct

Salamandra 2B Instruct is a transformer model from BSC pretrained from scratch on 12.875 trillion tokens across 35 European languages and code. The instruct…

COMMERCIAL OK·8K CTX

Bielik-11B v3.0 Instruct FP8 Dynamic

11B

An FP8-quantized build of Bielik-11B v3.0 Instruct, designed to run on vLLM or SGLang with roughly 50% less GPU memory than the BF16 original. Weights and…

COMMERCIAL OK·4K CTX

Salamandra 2B

2.25B

Salamandra 2B is a base-only transformer trained from scratch by Barcelona Supercomputing Center on 12.875 trillion tokens across 35 European languages and…

COMMERCIAL OK·8K CTX

OpenThaiGPT 7B 1.0.0 Chat

A 7B Thai-language chat model built on LLaMA 2, pretrained on 65B+ Thai words and instruction-tuned on 1M+ Thai examples. Adds 10,000 common Thai vocabulary…

COMMERCIAL OK·4K CTX

Bielik 11B v3.0 Instruct GGUF

11B

Bielik 11B v3.0 is SpeakLeash's instruction-tuned model built around Polish, with coverage across 32 European languages. It runs at 11B parameters with a 32K…

COMMERCIAL OK·33K CTX

Saiga Llama3 8B GGUF

Saiga Llama3 8B is a Russian-language fine-tune of Meta's Llama 3 8B, trained on the Saiga Scored dataset and packaged in GGUF format for llama.cpp. It targets…

RESTRICTED·8K CTX

Salamandra 7B

Salamandra 7B is a base language model from Barcelona Supercomputing Center, pretrained on 12.875 trillion tokens across 35 European languages and code. It is…

COMMERCIAL OK·8K CTX

OpenThaiGPT 1.0.0 Beta 13B Chat

13B

OpenThaiGPT 1.0.0 Beta is a 13B LLaMA v2 Chat fine-tune trained on translated Thai instructions. Vocabulary was expanded by 10,000+ Thai tokens to speed up…

COMMERCIAL OK·4K CTX

Gervásio 8B PTPT

Gervásio 8B PTPT is a LLaMA 3.1 8B Instruct fine-tune from PORTULAN/University of Lisbon, trained on Portuguese-specific datasets including extraGLUE-Instruct…

COMMERCIAL OK·4K CTX

Llama 3.3 8B Instruct

Meta's Llama 3.3 at 8B. Drop-in upgrade from Llama 3.1 8B; same hardware envelope, better instruction following.

COMMERCIAL OK·131K CTX

Llama 4 405B

405B

Meta's dense flagship in the Llama 4 line. 405B params; comparable footprint to Llama 3.1 405B with the Llama 4 reasoning improvements.

COMMERCIAL OK·131K CTX

Llama 3.2 11B Vision

11B

Llama 3.2 multimodal at 11B. Consumer-tier multimodal predecessor to Llama 4 Scout.

COMMERCIAL OK·MULTIMODAL·131K CTX

Llama 4 70B

70B

Llama 4 dense at 70B. Drop-in successor to Llama 3.3 70B; same hardware envelope, better on reasoning benchmarks.

COMMERCIAL OK·131K CTX

Phind CodeLlama 34B v2

34B

Phind's CodeLlama-derived coder at 34B. Older release; retained for historical / continuity value. Newer Qwen Coder lineage has surpassed it.

COMMERCIAL OK·16K CTX

EVA Llama 3.3 70B

70B

EVA community's storytelling-focused fine-tune of Llama 3.3 70B. Popular in the creative-writing / roleplay community.

COMMERCIAL OK·131K CTX

Llama 3.2 90B Vision

90B

Llama 3.2 multimodal at 90B. Datacenter-tier predecessor to Llama 4 Maverick. Strong visual reasoning.

COMMERCIAL OK·MULTIMODAL·131K CTX

FAM · QWEN

Qwen

39 models

Qwen 3.5 235B-A17B (MoE)

397B

Alibaba's May 2026 flagship. 397B total / 17B active MoE with hybrid thinking-mode toggle inherited from Qwen 3. Strongest open scientific reasoner per GPQA…

COMMERCIAL OK·262K CTX

Qwen 3 235B-A22B

235B

Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference…

COMMERCIAL OK·131K CTX

Qwen 3 0.6B

0.6B

Qwen3-0.6B is the smallest dense model in Alibaba's Qwen3 generation, supporting a 40K-token context and dual-mode operation that toggles between explicit…

COMMERCIAL OK·41K CTX

Qwen 3 30B-A3B

30B

Mid-tier Qwen 3 MoE. 30B total / 3B active means 70B-class quality at 7B-class inference speed on a single 24GB card. The sweet spot of the Qwen 3 lineup for…

COMMERCIAL OK·131K CTX

Qwen 2.5 Coder 32B Instruct

32B

Coding-specialist Qwen 2.5. Beats GPT-4o on HumanEval and matches Sonnet on many code-edit benchmarks. The default local-coding model on 24GB cards.

COMMERCIAL OK·131K CTX

Qwen 3 32B

32B

Dense Qwen 3 32B. Best dense open-weight model in its size class at release; pairs nicely with a single RTX 5090 or 4090.

COMMERCIAL OK·131K CTX

Qwen 3 8B

Qwen 3 at the 8B scale. Direct head-to-head against Llama 3.1 8B on most benchmarks; usually wins on coding and structured output.

COMMERCIAL OK·131K CTX

Qwen 3 1.7B

1.7B

Qwen3-1.7B is the mid-tier dense model in Qwen3, sharing the same hybrid thinking architecture and 40K context as the 0.6B but with ~3x the parameters for…

COMMERCIAL OK·41K CTX

Qwen 3 14B

14B

14B Qwen 3. Fits on 12GB cards at Q4. Strong default for users with a single mid-range GPU.

COMMERCIAL OK·131K CTX

Qwen2-VL 2B Instruct

Qwen2-VL 2B Instruct is Alibaba's compact vision-language model with native dynamic-resolution image handling and multimodal RoPE (M-RoPE) for video and…

COMMERCIAL OK·33K CTX

Qwen 2.5 7B Instruct

The community-default small Qwen prior to Qwen 3. Still widely used because of mature ecosystem support.

COMMERCIAL OK·131K CTX

Qwen 2.5 14B Instruct

14B

14B Qwen 2.5. Sweet spot for 16GB VRAM. Many production deployments still on this version.

COMMERCIAL OK·131K CTX

Qwen 3.6 35B-A3B (MTP)

35B

Qwen 3.6 35B-A3B with Multi-Token Prediction (MTP). The "A3B" suffix means ~3B activated parameters per token via Mixture-of-Experts — inference cost stays…

COMMERCIAL OK·262K CTX

Qwen 2.5 32B Instruct

32B

Dense 32B Qwen 2.5. Strong daily-driver on 24GB cards prior to Qwen 3 32B.

COMMERCIAL OK·131K CTX

QwQ 32B Preview

32B

Qwen team's reasoning-focused experimental release. Visible chain-of-thought in <think> tags. Precursor to Qwen 3's thinking mode.

COMMERCIAL OK·33K CTX

Qwen 3 4B

Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.

COMMERCIAL OK·131K CTX

Qwen 2.5 72B Instruct

72B

The flagship of Qwen 2.5. Workstation-tier; needs 48GB+ VRAM for usable inference.

COMMERCIAL OK·131K CTX

Qwen 3.6 27B (MTP)

27B

Qwen 3.6 27B dense (not MoE) with Multi-Token Prediction. Sits between the 14B and 35B-A3B as a "single dense model with MTP throughput acceleration." Targets…

COMMERCIAL OK·131K CTX

Qwen 3.5 2B Turkish SFT

Qwen 3.5 2B base with supervised fine-tuning on Turkish instruction-following data. Recent community fine-tune (early 2026) that bridges Qwen 3.5's strong…

COMMERCIAL OK·33K CTX

Qwen3.5 9B Thai Law Base

8.95B

Continued pre-training of Qwen3.5-9B-Base on 68M+ tokens of Thai legal text — acts, decrees, and court rulings. This is a raw base model, not an assistant; you…

COMMERCIAL OK·4K CTX

Qwen3 Swallow 32B RL v0.2

32B

A 32B Japanese-English model built on Qwen3, trained with continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards.…

COMMERCIAL OK·33K CTX

Qwen3 0.6B Hindi Instruct v1 GGUF

0.6B

A 0.6B Qwen3 model fine-tuned on English-to-Hindi instruction pairs and quantized to GGUF. Fits in 370MB and runs on CPU-only hardware. Trained on 2,000…

COMMERCIAL OK·2K CTX

Qwen 3 Coder 32B

32B

Coding-specialized fine-tune of Qwen 3 32B. Curated coding corpus; outperforms Qwen 2.5 Coder 32B on SWE-Bench by ~6 points. Apache 2.0.

COMMERCIAL OK·131K CTX

Qwen 2.5-VL 72B

72B

Qwen 2.5 vision-language flagship at 72B. Strong on document understanding + multi-image queries. Apache 2.0.

COMMERCIAL OK·MULTIMODAL·33K CTX

Qwen 2.5 Math 7B

Qwen 2.5 fine-tuned for math problem-solving with chain-of-thought and tool-integrated reasoning.

COMMERCIAL OK·4K CTX

Qwen 3 7B

Qwen 3 mid-tier. Same reasoning-mode toggle as Qwen 3 32B/14B/8B. Hits the consumer-laptop sweet spot.

COMMERCIAL OK·131K CTX

Qwen 2.5 0.5B Instruct

0.5B

Smallest Qwen 2.5. Apache 2.0; phone / Pi-class deployment target.

COMMERCIAL OK·33K CTX

CodeQwen 1.5 7B

CodeQwen 1.5 — Qwen Coder predecessor. Superseded by Qwen 2.5 Coder for new deployments.

COMMERCIAL OK·66K CTX

Qwen 2.5 Coder 14B Instruct

14B

Coding-specialized Qwen 2.5 at 14B. The 16GB-VRAM tier coding model — fits comfortably with 8K context.

COMMERCIAL OK·131K CTX

Qwen 2-VL 7B

Qwen 2 vision-language predecessor to Qwen 2.5-VL. Apache 2.0 with strong document Q&A.

COMMERCIAL OK·MULTIMODAL·33K CTX

Qwen 2.5 Coder 3B

Compact Qwen 2.5 Coder. Sweet spot for laptop autocomplete and small refactor agents.

COMMERCIAL OK·33K CTX

Qwen 2.5 Coder 1.5B

1.5B

Smallest Qwen 2.5 Coder. Targets edge / autocomplete on integrated GPUs and Apple Silicon laptops.

COMMERCIAL OK·33K CTX

Qwen 2.5-VL 3B

Smallest Qwen 2.5-VL. Edge-deployable VLM with strong document Q&A.

COMMERCIAL OK·MULTIMODAL·33K CTX

Qwen 2.5-VL 7B

Consumer-tier Qwen 2.5 VL. 7B + vision. Fits 8GB cards; the smallest practical multimodal Qwen.

COMMERCIAL OK·MULTIMODAL·33K CTX

Qwen 2.5 Math 72B

72B

Largest Qwen 2.5 Math. Datacenter-tier math specialist; eclipsed by R1 distills for general reasoning.

COMMERCIAL OK·4K CTX

Qwen 2.5 Coder 7B Instruct

Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.

COMMERCIAL OK·131K CTX

Qwen 2.5 1.5B Instruct

1.5B

Compact Qwen 2.5. The 1.5B Apache-2.0 baseline.

COMMERCIAL OK·33K CTX

Qwen 2.5 3B Instruct

Mid-edge Qwen 2.5. Note: 3B variant uses Qwen License (not Apache 2.0).

COMMERCIAL OK·33K CTX

Qwen 3 Embedding 8B

Qwen 3 family embedding model. Apache 2.0 with strong multilingual coverage.

COMMERCIAL OK·33K CTX

FAM · MISTRAL

Mistral

31 models

Mistral Medium 3.5 (675B MoE)

675B

Mistral's April 2026 frontier MoE. 675B total / 41B active. Strong European-multilingual lineage carries through; the new release competes head-to-head with…

RESTRICTED·262K CTX

Mistral Small 3 24B

24B

Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.

COMMERCIAL OK·33K CTX

Mistral Nemo 12B Instruct

12B

Joint Mistral/NVIDIA release with native 128K context and a new Tekken tokenizer. Strong multilingual; popular fine-tune base.

COMMERCIAL OK·131K CTX

Pixtral 12B

12B

Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.

COMMERCIAL OK·MULTIMODAL·131K CTX

Codestral 22B

22B

Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.

RESTRICTED·33K CTX

Mistral 7B Instruct v0.3

The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.

COMMERCIAL OK·33K CTX

Mistral Large 2 (123B)

123B

Mistral's flagship dense model. Open weights but restricted commercial license — research and non-commercial only.

RESTRICTED·131K CTX

Kumru 2B

2.4B

Kumru 2B is a compact Turkish text-generation model from VNGRS. The Hugging Face config reports a Mistral-family architecture with an 8K context window, and…

COMMERCIAL OK·8K CTX

Mistral 7B Instruct v0.2

Mistral 7B Instruct v0.2 is a 7-billion-parameter instruction-tuned model from Mistral AI with a 32,768-token context window. It uses `[INST]` prompt tags and…

COMMERCIAL OK·33K CTX

Mistral 7B Instruct v0.2

Mistral AI's second instruct revision of their 7B model, bumping context from 8k to 32k tokens and updating the tokenizer to `mistral_common`. It's an…

COMMERCIAL OK·33K CTX

Mistral 7B Instruct v0.1

Mistral 7B Instruct v0.1 is the instruction-tuned version of Mistral's first public 7B base model, fine-tuned on publicly available conversation datasets. It…

COMMERCIAL OK·4K CTX

Turkcell LLM 7B v1

7.4B

Turkcell LLM 7B v1 is an Apache-2.0 Turkish text-generation model built on a Mistral architecture. The measured Ollama artifact uses a RefinedNeuro GGUF…

COMMERCIAL OK·33K CTX

Turkish Mistral 7B Instruct v0.2

Mistral 7B v0.2 continued-pretrained on Turkish data + instruction-tuned. The 32K context window makes it the best Turkish open-weight model for long-document…

COMMERCIAL OK·33K CTX

Bielik 11B v2.3 Instruct

11B

Bielik 11B v2.3 Instruct is SpeakLeash's Polish-language instruction-tuned model, built on the Bielik-11B-v2 base and released under Apache 2.0. It targets…

COMMERCIAL OK·4K CTX

Bielik 11B v2.3 Instruct

11B

An 11B Polish-language instruction model from SpeakLeash and ACK Cyfronet AGH, built as a linear merge of three instruct-tuned Bielik-11B-v2 variants. Uses…

COMMERCIAL OK·4K CTX

Mistral Turkish v2 (brooqs)

7.2B

Mistral Turkish v2 is a public Ollama-distributed Turkish Mistral variant. The upstream Hugging Face repository was not publicly accessible during intake, so…

RESTRICTED·8K CTX

Malhajar Mistral 7B Turkish

7.2B

Malhajar Mistral 7B Turkish is an Apache-2.0 Mistral 7B Instruct v0.2 Turkish fine-tune. The benchmarked Ollama tag is a koezgen quantized distribution of the…

COMMERCIAL OK·33K CTX

Sarvam M

24B

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode…

COMMERCIAL OK·4K CTX

Bielik 7B Instruct v0.1 GGUF

Bielik 7B Instruct v0.1 is a Polish-language instruction-tuned model from speakleash, fine-tuned from Bielik-7B-v0.1 and distributed in GGUF format for…

RESTRICTED·4K CTX

Mistral 7B OpenOrca GGUF

Mistral 7B fine-tuned on the OpenOrca instruction dataset, distributed by TheBloke in GGUF format for local CPU and GPU inference. Uses ChatML prompt…

COMMERCIAL OK·33K CTX

Japanese StableLM Instruct Gamma 7B

A 7B instruction-tuned model from Stability AI built specifically for Japanese, using the Mistral architecture. Quantized to GGUF by TheBloke, so it runs on…

COMMERCIAL OK·33K CTX

Bielik 7B v0.1

Bielik-7B v0.1 is a 7B-parameter base model built by continuously pretraining Mistral-7B on 70B+ tokens of Polish text, with data quality filtered via an…

COMMERCIAL OK·4K CTX

Bielik 11B v2.2 Instruct GGUF

11B

Bielik 11B v2.2 Instruct is a Polish-language instruction-tuned model from speakleash, available in GGUF format for local inference. It supports 32,768-token…

COMMERCIAL OK·33K CTX

Devstral Small 2 24B

24B

Mistral's coding-specialized Mistral Small 2 successor. Apache 2.0 — the rare commercial-OK Mistral coder.

COMMERCIAL OK·131K CTX

Mistral Small 3.2 24B

24B

Iterative refresh of Mistral Small 3 24B. Same architecture; improved instruction following and tool-call reliability. Apache 2.0.

COMMERCIAL OK·131K CTX

Mistral Saba 24B

24B

Mistral's Arabic and South Asian language specialist at 24B. Research license.

RESTRICTED·33K CTX

Ministral 3B Instruct

Mistral edge model at 3B. Designed for on-device inference with extended 128k context. Research license only.

RESTRICTED·131K CTX

Ministral 8B Instruct

Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.

RESTRICTED·131K CTX

Codestral Mamba 7B

Mistral's Mamba (state-space) architecture coding model. Linear inference cost — the architectural alternative to attention-based coding models. Apache 2.0.

COMMERCIAL OK·256K CTX

Mistral Medium 3 24B (dense)

24B

Dense variant in the Mistral Medium 3.5 family. Research license — non-commercial open. Same training data as the MoE flagship but in a smaller dense package.

RESTRICTED·262K CTX

Magistral 32B

32B

Mistral's reasoning-specialized fine-tune of a Mistral Small base. Reasoning-token emission similar to Qwen 3 / DeepSeek R1 in a smaller footprint. Research…

RESTRICTED·131K CTX

FAM · DEEPSEEK

DeepSeek

20 models

DeepSeek V4 Pro (1.6T MoE)

1600B

DeepSeek's April 2026 frontier flagship. 1.6T total / 49B active MoE with hybrid Compressed Sparse Attention + Heavily Compressed Attention. 1M context window.…

COMMERCIAL OK·1.0M CTX

DeepSeek R1 (671B reasoning)

671B

Open reasoning model that closed the gap with frontier proprietary reasoners. Visible chain-of-thought, MIT license, and a family of distilled smaller variants.

COMMERCIAL OK·131K CTX

DeepSeek V4 Flash (284B MoE)

284B

The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it…

COMMERCIAL OK·1.0M CTX

DeepSeek R1 Distill Llama 70B

70B

Reasoning distillation onto Llama 3.3 70B. Best-in-class open-weight reasoner you can actually fit on a workstation.

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Qwen 32B

32B

32B distill — fits on a single 24GB card with reasoning capability. Best price-per-thinking-token combo for prosumers.

COMMERCIAL OK·131K CTX

DeepSeek V3 (671B MoE)

671B

DeepSeek's flagship MoE — 671B total / 37B active. Server-tier, but the smaller R1 distills make this lineage approachable.

COMMERCIAL OK·66K CTX

DeepSeek R1 Distill Qwen 7B

Smallest practical R1 distill. Reasoning on a 6GB GPU.

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Qwen 14B

14B

14B reasoning distill. Fits on 12GB cards.

COMMERCIAL OK·131K CTX

DeepSeek Coder V2 Lite (16B)

16B

MoE coding specialist — 16B total / 2.4B active. Fast on 12GB cards.

COMMERCIAL OK·131K CTX

DeepSeek V2 Lite Chat

15.7B

DeepSeek-V2-Lite-Chat is a 15.7B-total, 2.4B-active mixture-of-experts chat model using DeepSeek's Multi-head Latent Attention (MLA) architecture for KV-cache…

COMMERCIAL OK·33K CTX

DeepSeek Coder V2 236B

236B

Full DeepSeek Coder V2. 236B total / 21B active MoE coder.

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Llama 8B

R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but…

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Qwen 1.5B

1.5B

Smallest R1 distill. Surprisingly capable reasoning at 1.5B for its size class; right pick when you need reasoning AND edge deployment.

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Qwen 3 32B

32B

Newer R1 distill on a Qwen 3 base. Combines R1 reasoning with Qwen 3's reasoning-toggle architecture. Apache 2.0.

COMMERCIAL OK·131K CTX

DeepSeek V2.5 236B

236B

DeepSeek V2.5 — merged V2 chat + Coder. Pre-V3 baseline; 21B active MoE.

COMMERCIAL OK·131K CTX

DeepSeek V4

745B

DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source…

COMMERCIAL OK·131K CTX

DeepSeek V3 Lite (16B MoE)

16B

Distillation of DeepSeek V3 to a smaller MoE. 16B total / 2.4B active. Captures most of V3's reasoning at consumer-card-friendly memory.

COMMERCIAL OK·131K CTX

DeepSeek R1 Distill Mistral 24B

24B

Community R1 distill onto a Mistral Small 3 base. Apache 2.0; combines R1 reasoning with Mistral instruction polish.

COMMERCIAL OK·33K CTX

DeepSeek MoE 16B Base

16B

DeepSeek's first MoE — 16B / 2.4B active. Older model retained for ecosystem-context value as the base of the V2/V3 lineage.

COMMERCIAL OK·4K CTX

DeepSeek Coder V3

33B

DeepSeek's coder line successor. Dense 33B; competitive with Qwen 2.5 Coder 32B on SWE-Bench.

COMMERCIAL OK·131K CTX

FAM · GEMMA

Gemma

20 models

Gemma 4 31B Dense

31B

Google's flagship dense Gemma 4. Beats some 400B-class proprietary models on benchmarks. Targets the 24GB single-GPU sweet spot.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 4 26B MoE

26B

MoE variant of Gemma 4. Faster per-token than the 31B dense at similar quality on most tasks.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 3 270M

0.27B

Gemma 3 270M is the smallest member of Google's Gemma 3 family, a 270-million-parameter text-only model designed for on-device deployment and task-specific…

COMMERCIAL OK·33K CTX

Gemma 3 27B

27B

Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 4 E4B (Effective 4B)

Edge-class Gemma 4. The 'Effective 4B' branding signals it punches above its parameter count via training-data quality.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 3 12B

12B

12B Gemma 3. Fits on 12GB consumer cards. Multimodal.

COMMERCIAL OK·MULTIMODAL·131K CTX

Trendyol LLM Asure 12B

11.8B

Trendyol LLM Asure 12B is a Gemma 3 based multimodal instruct model for Turkish and English business workflows. The public Ollama build used in local testing…

COMMERCIAL OK·MULTIMODAL·131K CTX

Turkish Gemma 9B T1

YTU's Turkish-tuned Gemma 2 9B model. The highest community-rated Turkish-language LLM on Hugging Face by likes-to-downloads ratio as of May 2026. Continued…

COMMERCIAL OK·8K CTX

Gemma 2 2B Instruct

Gemma 2 2B Instruct is Google's instruction-tuned 2B model from the Gemma 2 generation, trained with knowledge distillation from larger Gemma models. It…

COMMERCIAL OK·8K CTX

Gemma 3 4B

4B Gemma 3 for edge. Multimodal.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 2 9B Instruct

Mid-size Gemma 2. Strong chat quality with a different training mix from Llama family.

COMMERCIAL OK·8K CTX

Gemma 4 E2B (Effective 2B)

Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.

COMMERCIAL OK·MULTIMODAL·131K CTX

Gemma 4 Turkish 26B (4B active)

26B

Gemma 4 26B MoE (4B active params) pruned and Turkish-tuned. The largest Turkish-tuned open-weight model on HF as of May 2026. MoE architecture means it loads…

COMMERCIAL OK·131K CTX

YTU Turkish Gemma 9B v0.1

9.2B

YTU Turkish Gemma 9B v0.1 is a Gemma 2 based Turkish instruction model from the YTU CE COSMOS ecosystem. The benchmarked Ollama tag is an alibayram GGUF…

COMMERCIAL OK·8K CTX

Gemma 3 1B

Smallest text-only Gemma 3 for phones and IoT.

COMMERCIAL OK·33K CTX

MedGemma 27B

27B

Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.

RESTRICTED·MULTIMODAL·131K CTX

CodeGemma 7B

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

COMMERCIAL OK·8K CTX

ColPali v1.3

3B-parameter visual document retriever built on PaliGemma-3B using a ColBERT-style late-interaction objective. Encodes a PDF page as a grid of patch…

COMMERCIAL OK

PaliGemma 2 10B

10B

Mid-tier PaliGemma 2 fine-tuning base. Better baseline for complex vision tasks.

COMMERCIAL OK·MULTIMODAL·8K CTX

PaliGemma 2 3B

PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.

COMMERCIAL OK·MULTIMODAL·8K CTX

FAM · PHI

Phi

7 models

Phi-4 14B

14B

Microsoft's Phi-4 14B trained on synthetic textbook-quality data. Punches above weight on reasoning and math; MIT licensed.

COMMERCIAL OK·16K CTX

Phi-4 Reasoning 14B

14B

Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.

COMMERCIAL OK·33K CTX

Phi-3.5 Mini Instruct

3.8B

Compact 3.8B Phi for edge deployment. 128K context. Strong reasoning per parameter.

COMMERCIAL OK·131K CTX

Phi-3.5 Vision

4.2B

Multimodal Phi 3.5. Document and chart understanding at edge size. MIT licensed.

COMMERCIAL OK·MULTIMODAL·131K CTX

Phi-4 Multimodal

14B

Multimodal variant of Phi-4 14B. Vision + text. Smaller than Llama 4 Scout but covers most image-Q&A workflows; right-sized for 16GB consumer cards.

COMMERCIAL OK·MULTIMODAL·131K CTX

Phi-4 Mini 4B

3.8B

Microsoft's edge-tier Phi-4 variant. 3.8B params; designed for phone / tablet / Pi deployment. Strong reasoning per parameter — Phi family's traditional…

COMMERCIAL OK·131K CTX

Phi-4 Reasoning Mini 4B

3.8B

Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.

COMMERCIAL OK·131K CTX

FAM · EXAONE

EXAONE

6 models

EXAONE 3.5 2.4B Instruct

2.4B

EXAONE 3.5 2.4B Instruct is LG AI Research's bilingual English/Korean model built for low-resource devices. It handles up to 32K context tokens and shows…

RESTRICTED·33K CTX

K-EXAONE 236B A23B

236B

K-EXAONE is LG AI Research's 236B Mixture-of-Experts model with 23B active parameters per forward pass. It covers Korean, English, Spanish, German, Japanese,…

RESTRICTED·262K CTX

EXAONE 4.0.1 32B

32B

EXAONE 4.0.1 is a 32B model from LG AI Research with a 131K context window and a hybrid sliding-window/full-attention architecture. It runs in either standard…

RESTRICTED·131K CTX

EXAONE 3.5 8B

7.8B

Smaller EXAONE for consumer-tier Korean / CJK workloads.

RESTRICTED·33K CTX

EXAONE 3.5 32B

32B

LG AI Research's flagship Korean-ecosystem model. Strong on Korean/Japanese language tasks; competitive on English. License blocks commercial use without LG…

RESTRICTED·33K CTX

EXAONE 3.5 2.4B

2.4B

LG AI's edge-tier EXAONE. Strong Korean / English. Research-only license.

RESTRICTED·33K CTX

FAM · GRANITE

granite

6 models

Granite 3.1 2B Instruct

Granite 3.1 2B Instruct is IBM's 2B-parameter dense instruct model with a 128K context window, post-trained for enterprise tasks including RAG, function…

COMMERCIAL OK·131K CTX

Granite 3.3 8B

IBM Granite 3.3. Iterative refresh of 3.2 — same architecture; improved instruction following and tool-call reliability. Apache 2.0.

COMMERCIAL OK·131K CTX

Granite 3.0 2B Instruct

IBM Granite at 2B. Apache 2.0 enterprise-friendly small model with safety tuning.

COMMERCIAL OK·4K CTX

Granite 3.0 8B Instruct

Granite 3.0 8B — IBM's enterprise-tier baseline. Apache 2.0.

COMMERCIAL OK·4K CTX

Granite 3.2 8B

IBM's enterprise-tuned 8B. Apache 2.0. Strong on enterprise-shaped tool-calling and structured output. Watson + RHEL ecosystem alignment.

COMMERCIAL OK·131K CTX

Granite 3 MoE (3B active)

16B

Granite MoE shape. 16B total / 3B active. Workstation-deployable; the IBM enterprise alternative to Qwen / DeepSeek small MoEs.

COMMERCIAL OK·131K CTX

FAM · COMMAND-R

Command R

5 models

Command R+ 104B

104B

Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.

RESTRICTED·131K CTX

Command R 35B

35B

Cohere's mid-tier — RAG and tool use. Non-commercial license.

RESTRICTED·131K CTX

Command R7B (12-2024)

Command R7B (December 2024) is Cohere's smallest model in the Command R family, an 8B-parameter dense transformer with 128K context, trained for…

RESTRICTED·131K CTX

Command R+ (Aug 2024)

104B

Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.

RESTRICTED·131K CTX

Aya Expanse 32B

32B

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya…

RESTRICTED·8K CTX

FAM · FALCON

falcon

5 models

Falcon 40B Instruct

40B

Falcon-40B-Instruct is a 40B parameter instruction-tuned model from TII (UAE), fine-tuned on Baize chat data for conversation and instruction-following. It…

COMMERCIAL OK·2K CTX

Falcon 3 3B Instruct

Falcon 3 3B Instruct is TII's 3-billion-parameter instruct model from the Falcon 3 family, supporting English, French, Spanish, and Portuguese with a 32K…

COMMERCIAL OK·33K CTX

Falcon 3 7B Instruct

Falcon 3 mid-size from TII. Permissive Falcon license; multilingual focus.

COMMERCIAL OK·33K CTX

Falcon 3 10B

10B

TII's Falcon 3 at the 10B tier. Strong on Arabic-language tasks; competitive on English.

COMMERCIAL OK·33K CTX

Falcon Mamba 7B

TII's Mamba (state-space) architecture model. Linear inference cost; the architectural alternative to attention-based models.

COMMERCIAL OK·256K CTX

FAM · HERMES

hermes

4 models

Hermes 3 Llama 3.1 8B

NousResearch's Hermes fine-tune of Llama 3.1 8B. Stronger system-prompt adherence, JSON output, role-play, and agent steering than the base Llama.

COMMERCIAL OK·131K CTX

Hermes 3 Llama 3.1 70B

70B

Hermes 3 at 70B. Workstation-tier agent-tuned model.

COMMERCIAL OK·131K CTX

Hermes 3 Llama 3.2 3B

Nous Research's Hermes 3 fine-tune of Llama 3.2 3B. Strong general-instruction following at the 3B tier.

COMMERCIAL OK·131K CTX

Hermes 4 Llama 3.3 70B

70B

Nous Research's Hermes 4 fine-tune of Llama 3.3 70B. Strong on instruction following and creative tasks; community-favored alternative to base Llama.

COMMERCIAL OK·131K CTX

FAM · DOLPHIN

dolphin

3 models

Dolphin 3.0 Mistral 24B

24B

Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.

COMMERCIAL OK·33K CTX

Dolphin 3.0 Llama 3.2 3B

Eric Hartford's Dolphin fine-tune at 3B. Less-censored than the base Llama; popular for unconstrained-generation use cases.

COMMERCIAL OK·131K CTX

Dolphin 3 Llama 3.3 70B

70B

Eric Hartford's Dolphin 3 at 70B Llama 3.3 base. Less-restricted alternative for creative / unconstrained workflows.

COMMERCIAL OK·131K CTX

FAM · MIXTRAL

mixtral

3 models

Mixtral 8x7B Instruct

47B

The MoE model that introduced the 8-experts pattern to the open-weight world. 47B params total, 13B active. Still a viable workhorse on 36GB+ setups.

COMMERCIAL OK·33K CTX

Mixtral 8x22B Instruct

141B

The bigger Mixtral. 141B total / 39B active. Strong general model, workstation-tier deployment.

COMMERCIAL OK·66K CTX

Mixtral 8X7B Instruct v0.1 GPTQ

46.7B

GPTQ 4-bit quantized build of Mistral AI's Mixtral 8x7B Instruct, a sparse mixture-of-experts model with 46.7B total parameters. Natively handles German,…

COMMERCIAL OK·8K CTX

FAM · GLM

GLM

3 models

GLM-4V 9B

13.9B

GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.

RESTRICTED·MULTIMODAL·8K CTX

GLM-4 9B

Zhipu's GLM-4 at 9B. Strong on Chinese-language tasks; tool-calling format slightly different from OpenAI convention.

RESTRICTED·131K CTX

GLM-5 Pro

144B

Zhipu's GLM-5 flagship. 144B total / 16B active MoE. Strong on Chinese-language tasks; competitive on English at the workstation-cluster tier.

RESTRICTED·131K CTX

FAM · MINICPM

MiniCPM

3 models

MiniCPM-V 3 8B

MiniCPM-V successor. Multimodal at 8B with stronger document Q&A than 2.6.

COMMERCIAL OK·MULTIMODAL·33K CTX

MiniCPM 3 4B

OpenBMB's edge-optimized 4B. MIT license; designed for phone deployment. Strong reasoning per parameter.

COMMERCIAL OK·33K CTX

MiniCPM-V 2.6 8B

Multimodal MiniCPM at 8B. Vision + text; strong on document Q&A for the size class.

COMMERCIAL OK·MULTIMODAL·33K CTX

FAM · YI

Yi

2 models

Yi 1.5 34B

34B

01.AI's 34B model. Solid bilingual EN/ZH performance, Apache 2.0.

COMMERCIAL OK·16K CTX

Yi Coder 9B

01.AI's coding specialization at 9B. Apache 2.0; positioned as a lighter alternative to Qwen 2.5 Coder for the 16GB tier.

COMMERCIAL OK·131K CTX

FAM · STEPFUN

StepFun

2 models

GOT-OCR 2.0

0.58B

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX…

COMMERCIAL OK

Step-3

1000B

StepFun's 1T-parameter MoE. 38B active. One of the largest open-weight models; cluster-only at any quant. Restricted license.

RESTRICTED·66K CTX

FAM · OLMO

OLMo

2 models

OLMo 2 1B Instruct

OLMo 2 1B Instruct is AllenAI's 1-billion-parameter instruct model from the April 2025 OLMo 2 release, post-trained with RLVR on math. It is fully open:…

COMMERCIAL OK·4K CTX

OLMo 2 13B

13B

AI2's fully-open 13B. Apache 2.0; full training data + checkpoints + recipes published. The reproducibility-first model in the 13B class.

COMMERCIAL OK·4K CTX

FAM · INTERNLM

InternLM

2 models

InternLM 2.5 7B Chat

InternLM 2.5 mid-size chat. Apache 2.0; strong on math and Chinese.

COMMERCIAL OK·1.0M CTX

InternLM 3 8B

Shanghai AI Lab's open-research line. InternLM 3 at 8B; strong on Chinese-language tasks.

RESTRICTED·33K CTX

FAM · DBRX

DBRX

2 models

DBRX Instruct

132B

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

COMMERCIAL OK·33K CTX

DBRX Base

132B

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

COMMERCIAL OK·33K CTX

FAM · WIZARD

wizard

1 model

WizardLM-2 8x22B

141B

Microsoft's RLHF-heavy fine-tune of Mixtral 8x22B. Briefly the top open chat model on LMSYS at release.

COMMERCIAL OK·66K CTX

FAM · BAICHUAN

baichuan

1 model

Baichuan 4 13B

13B

Baichuan AI's 13B. Chinese-language ecosystem alternative to Qwen / GLM. Restricted commercial license.

RESTRICTED·131K CTX

FAM · JANUS

janus

1 model

Janus-Pro 7B

DeepSeek's multimodal 7B. Decoupled visual encoding for understanding vs generation — different from typical VLM design.

COMMERCIAL OK·MULTIMODAL·4K CTX

FAM · RWKV

RWKV

1 model

RWKV 7 'Goose' 1.5B

1.5B

RWKV 7 'Goose' at 1.5B. Linear-time inference architecture (constant memory regardless of context). Apache 2.0.

COMMERCIAL OK·1.0M CTX

FAM · OPENCODER

OpenCoder

1 model

OpenCoder 8B

Fully-open coding model — training data + recipes published. Apache 2.0 with verifiable open-data lineage. The right pick for academic /…

COMMERCIAL OK·33K CTX

FAM · HUNYUAN

hunyuan

1 model

Hunyuan Large 389B MoE

389B

Tencent's frontier MoE. 389B total / 52B active. License permits commercial use with restrictions on companies above MAU thresholds.

COMMERCIAL OK·256K CTX

FAM · MOONSHOT

moonshot

1 model

Kimi K1.5

200B

Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted…

RESTRICTED·200K CTX

FAM · OPENBIOLLM

openbiollm

1 model

OpenBioLLM Llama 3 70B

70B

Medical / biomedical fine-tune of Llama 3 70B. Strong on USMLE and clinical-knowledge benchmarks; right pick when domain-specific medical depth matters more…

COMMERCIAL OK·8K CTX

FAM · OTHER

Other

100 models

all-MiniLM-L6-v2

0.022B

all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM…

COMMERCIAL OK·256 CTX

FLUX.1 [dev]

12B

12B-parameter rectified-flow transformer for text-to-image, guidance-distilled from the FLUX.1 [pro] teacher. Currently the most-liked model on Hugging Face…

RESTRICTED

Nomic Embed Text v1.5

0.137B

Nomic Embed Text v1.5 is a 137M-parameter English embedding model with an 8192-token context window, trained with Matryoshka Representation Learning so the…

COMMERCIAL OK·8K CTX

Kokoro 82M

0.082B

82M-parameter StyleTTS2-derived TTS that went viral in early 2025 for matching billion-parameter TTS quality at ~1% the size. Apache-2.0 weights, dozens of…

COMMERCIAL OK

BGE Large EN v1.5

0.335B

BGE Large EN v1.5 is the 335M-parameter English flagship from BAAI's FlagEmbedding family, producing 1024-dim embeddings with a 512-token context window.…

COMMERCIAL OK·512 CTX

BGE Reranker v2 M3

0.57B

BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.

COMMERCIAL OK·8K CTX

all-mpnet-base-v2

0.109B

all-mpnet-base-v2 is a 109M-parameter sentence-transformers embedder based on Microsoft's MPNet, producing 768-dim vectors with a 384-token context. Trained on…

COMMERCIAL OK·384 CTX

XTTS v2

0.46B

Coqui's flagship multilingual voice-cloning TTS — clones a speaker from a 6-second reference clip and synthesizes in 17 languages with cross-lingual transfer.…

RESTRICTED

Whisper Base

0.074B

74M-parameter Whisper variant — roughly 2x the params of tiny for ~25-30% relative WER reduction. The standard pick for CPU realtime transcription with…

COMMERCIAL OK·30 CTX

Whisper Small

0.244B

244M-parameter Whisper. The smallest Whisper checkpoint considered 'production grade' for non-English audio. Sweet spot for laptops with iGPU/Metal or modest…

COMMERCIAL OK·30 CTX

Whisper Tiny

0.039B

Smallest member of the Whisper encoder-decoder ASR family (39M params). Trained on 680k hours of weakly supervised multilingual audio. Targets sub-realtime…

COMMERCIAL OK·30 CTX

paraphrase-multilingual-MiniLM-L12-v2

0.118B

paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing…

COMMERCIAL OK·128 CTX

GLM-5

200B

Zhipu's GLM-5 currently leads the Open LLM Leaderboard 2026. Strong reasoning and bilingual EN/ZH capability.

COMMERCIAL OK·200K CTX

mxbai-embed-large-v1

0.335B

mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation…

COMMERCIAL OK·512 CTX

Jina Embeddings v3

0.572B

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage,…

RESTRICTED·8K CTX

Multilingual E5 Large Instruct

0.56B

Multilingual E5 Large Instruct is a 560M-parameter XLM-RoBERTa-large encoder fine-tuned by Microsoft's intfloat team with task instructions appended to…

COMMERCIAL OK·514 CTX

FLUX.1 [schnell]

12B

12B rectified-flow transformer, timestep-distilled to 1-4 sampling steps, released under Apache-2.0. Same architecture as FLUX.1 [dev] but trades a bit of…

COMMERCIAL OK

Nemotron 3 Nano (30B-A3B)

30B

NVIDIA's hybrid Mamba-2 + Transformer MoE for on-device agents. 30B total / 3B active. 1M-token context window with reasoning ON/OFF modes and 4× faster…

COMMERCIAL OK·1M CTX

Kimi K2.6

1000B

Moonshot's long-context, agent-oriented MoE. Optimized for stability under tool use and multi-step coding/planning workflows.

COMMERCIAL OK·2M CTX

SigLIP SO400M (patch14-384)

0.428B

428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1…

COMMERCIAL OK

Nemotron 3 Super (120B-A12B)

120B

Workstation-tier Nemotron 3. 120B total / 12B active. 5× higher throughput than the prior Super, 1M context, designed for multi-agent applications.

COMMERCIAL OK·1M CTX

Distil-Whisper Large v3

0.756B

756M-param distilled Whisper-large-v3 with the decoder shrunk from 32 to 2 layers. ~6.3x faster than the teacher at near-parity WER on long-form English (1%…

COMMERCIAL OK·30 CTX

Snowflake Arctic Embed L v2.0

0.568B

Arctic Embed L v2.0 is a 568M-parameter multilingual embedder from Snowflake based on XLM-RoBERTa, producing 1024-dim Matryoshka vectors with an 8192-token…

COMMERCIAL OK·8K CTX

SDXL Turbo

2.6B

2.6B SDXL backbone trained with Adversarial Diffusion Distillation (ADD), producing photorealistic 512px images in a single forward pass. Designed for…

RESTRICTED

Jina Reranker v2 Base Multilingual

0.278B

Jina Reranker v2 Base Multilingual is a 278M-parameter cross-encoder from Jina AI with a 1024-token context, trained on 100+ languages plus code and structured…

RESTRICTED·1K CTX

SmolLM2 135M Instruct

0.135B

SmolLM2-135M-Instruct is the smallest instruction-tuned model in Hugging Face's SmolLM2 family, a 135M-parameter Llama-architecture model trained for on-device…

COMMERCIAL OK·8K CTX

OLMo 2 32B

32B

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

COMMERCIAL OK·33K CTX

Florence-2 Large

0.77B

770M-parameter unified vision foundation model with a DaViT image encoder and BART-style seq2seq decoder. One model, one set of weights — handles captioning,…

COMMERCIAL OK

GTE ModernBERT Base

0.149B

GTE ModernBERT Base is a 149M-parameter English embedder built on AnswerDotAI's ModernBERT backbone, producing 768-dim vectors with native 8192-token context…

COMMERCIAL OK·8K CTX

E5 Mistral 7B Instruct

7.11B

E5-Mistral-7B-Instruct is a 7.11B-parameter decoder-based embedder fine-tuned from Mistral-7B by Microsoft's intfloat team, producing 4096-dim embeddings with…

COMMERCIAL OK·33K CTX

Omni 31B Turkish Reasoning

31B

31B-parameter Turkish-tuned reasoning model with i1-imatrix quantizations by mradermacher. Designed for step-by-step problem solving in Turkish. Highest…

RESTRICTED·33K CTX

Stable Diffusion 3.5 Medium

2.5B

2.5B MMDiT-X with improved Querying Key Normalization and dual attention blocks at lower resolutions. Trained for 0.25-2MP output. Positioned as the mid-tier…

COMMERCIAL OK

EXAONE Deep 7.8B

7.8B

EXAONE Deep 7.8B is LG AI Research's reasoning-focused model, fine-tuned from EXAONE-3.5-7.8B-Instruct for math and coding tasks. It claims benchmark wins over…

RESTRICTED·33K CTX

TinyLlama 1.1B Chat v0.3 AWQ

1.1B

TinyLlama 1.1B Chat v0.3 is a 1.1B-parameter chat model quantized to 4-bit AWQ by TheBloke. It uses the ChatML prompt format and fits comfortably in very low…

COMMERCIAL OK·2K CTX

TinyLlama 1.1B Chat v0.3 GPTQ

1.1B

GPTQ-quantized build of TinyLlama 1.1B Chat v0.3, trained on SlimPajama, StarCoder, and OpenAssistant data. Runs in roughly 0.8 GB VRAM thanks to 4-bit…

COMMERCIAL OK·2K CTX

Piper

0.025B

VITS-based neural TTS optimized for Raspberry Pi-class hardware. Ships as ONNX checkpoints with ~100 voices across 30+ languages. Powers Home Assistant's local…

COMMERCIAL OK

Ring-2.6-1T

1000B

InclusionAI's Ring-2.6-1T is a 1 trillion parameter Mixture-of-Experts model with ~32B activated parameters per token, released by Ant Group's AI research arm.…

COMMERCIAL OK·128K CTX

mxbai-rerank-large-v2

1.54B

mxbai-rerank-large-v2 is a 1.54B-parameter listwise reranker from Mixedbread AI built on Qwen2.5-1.5B, supporting 100+ languages and a 32K-token context with…

COMMERCIAL OK·33K CTX

GPT-NeoX 20B

20B

GPT-NeoX-20B is a 20B-parameter English autoregressive model from EleutherAI, trained on the 825 GiB Pile dataset. It uses a GPT-3-style transformer…

COMMERCIAL OK·2K CTX

NVIDIA Nemotron Nano 9B v2 Japanese

A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both…

COMMERCIAL OK·131K CTX

EXAONE 3.5 7.8B Instruct

7.8B

EXAONE 3.5 7.8B is LG AI Research's instruction-tuned bilingual model for English and Korean, with a 32K token context window. It succeeds EXAONE 3.0 with…

RESTRICTED·33K CTX

Mihenk LLM v2 35B (Turkish Financial)

35B

35B MoE (3B active) tuned specifically for Turkish financial-services text — bank statements, investment research, accounting terminology. Niche-cluster model…

RESTRICTED·33K CTX

Parakeet TDT 0.6B v2

0.6B

600M-parameter FastConformer-TDT transducer ASR from NVIDIA NeMo. Topped the Hugging Face Open ASR Leaderboard in 2025 for English, with WER ~6.05% averaged…

COMMERCIAL OK

Kanarya 2B

Turkish-from-scratch language model trained by Ali Safaya (Koç University researcher). Named after the kanarya (Turkish for 'canary'). Trained on 250+ GB of…

COMMERCIAL OK·2K CTX

SmolLM2 360M Instruct

0.36B

SmolLM2-360M-Instruct is the middle tier of the SmolLM2 instruct family, a 360M-parameter Llama-architecture model with an 8K context. It is shipped with ONNX…

COMMERCIAL OK·8K CTX

F5-TTS

0.336B

Flow-matching non-autoregressive TTS built on a Diffusion Transformer (DiT) backbone with ConvNeXt text refinement. Trained on the 100K-hour Emilia dataset;…

RESTRICTED

VBART Large (Turkish Summarization)

0.4B

Turkish BART-style sequence-to-sequence model fine-tuned specifically for summarization. Not a chat model — purpose-built for input-document → Turkish-summary…

COMMERCIAL OK·1K CTX

Kanarya 750M

0.75B

Smaller Kanarya variant — 750M parameters. Runs on CPU or 4GB GPU comfortably. Useful for low-resource Turkish text classification, embeddings, or completion…

COMMERCIAL OK·2K CTX

Turkish GPT-2 Large

0.7B

GPT-2 Large architecture trained from scratch on Turkish. Reference baseline for measuring how much modern instruction-tuned models actually improve on the…

COMMERCIAL OK·1K CTX

SmolVLM Instruct

2.25B

SmolVLM-Instruct is Hugging Face's compact vision-language model built on the Idefics3 architecture, pairing SmolLM2-1.7B-Instruct with a SigLIP-SO400M vision…

COMMERCIAL OK·8K CTX

Sarvam 30B

30B

Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size…

COMMERCIAL OK·4K CTX

Orpheus 3B 0.1 FT

LLaMA-architecture 3B model fine-tuned as a TTS that emits SNAC audio tokens. Designed for highly expressive, emotion-controllable speech with laughter, sighs,…

COMMERCIAL OK

EXAONE 3.5 32B Instruct

32B

EXAONE 3.5 32B Instruct is LG AI Research's 32B bilingual model, trained for instruction-following in English and Korean. It supports a 32,768-token context…

RESTRICTED·33K CTX

Merlyn Education Safety 12B AWQ

12B

A 12B GPT-NeoX model from Merlyn Mind, fine-tuned specifically to refuse or soften unsafe content in K-12 and higher-education contexts. Delivered in AWQ 4-bit…

COMMERCIAL OK·2K CTX

EXAONE 3.5 32B Instruct AWQ

32B

EXAONE 3.5 32B Instruct is LG AI Research's bilingual English/Korean instruction model, quantized to 4-bit AWQ for lower VRAM overhead. It supports a 32K…

RESTRICTED·33K CTX

Sarvam 105B

105B

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks…

COMMERCIAL OK·128K CTX

GPT-OSS Swallow 20B RL v0.1

20B

A 20B bilingual model from TokyoTech built on GPT-OSS via continual pre-training, SFT, and reinforcement learning with verifiable rewards (RLVR). Targets…

COMMERCIAL OK·33K CTX

llm-jp 4 32B A3B Thinking

32B

A 32B MoE model from Japan's National Institute of Informatics, with only 3B parameters active per forward pass. Trained on 11.7T tokens across four stages:…

COMMERCIAL OK·66K CTX

gpt2-base-french

0.124B

A 124M-parameter GPT-2 base model trained on French Wikipedia (wiki40b/fr) and a CC-100/fr subset, with a 50,000-token BPE vocabulary. It generates French text…

RESTRICTED·1K CTX

GPT-2 Spanish

0.124B

GPT-2 Spanish is a 124M-parameter model trained from scratch on 11.5GB of Spanish text (Wikipedia, books) with a custom Spanish BPE tokenizer. It generates…

COMMERCIAL OK·1K CTX

mGPT 13B

13B

mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and…

COMMERCIAL OK·2K CTX

PhoGPT 4B Chat

3.7B

PhoGPT-4B-Chat is VinAI's 3.7B-parameter Vietnamese chat model, fine-tuned from a base trained on 102B Vietnamese tokens. It handles up to 8192-token contexts…

COMMERCIAL OK·8K CTX

Pollux Judge 32B

32B

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a…

COMMERCIAL OK·4K CTX

GPT-2 Spanish Medium

0.355B

A 355M-parameter GPT-2 Medium trained from scratch on 11.5 GB of Spanish text (Wikipedia and books), with a BPE tokenizer built specifically for Spanish.…

COMMERCIAL OK·1K CTX

OpenELM 3B Instruct

OpenELM-3B-Instruct is Apple's 3-billion-parameter instruct model using a layer-wise scaled transformer with varying FFN multipliers and KV-head counts across…

RESTRICTED·2K CTX

mGPT 1.3B Uzbek

1.3B

A 1.3B-parameter GPT-2-style model fine-tuned on Uzbek text for 50,000 steps on a single A100. Covers Uzbek, Russian, and English generation. It is a base…

COMMERCIAL OK·2K CTX

PhoGPT 4B

3.7B

PhoGPT-4B is a 3.7B-parameter model pre-trained from scratch on 102B Vietnamese tokens, making it one of the few Vietnamese-first generative models available.…

COMMERCIAL OK·8K CTX

Dostoevsky Doesn't Write It GPT2

0.175B

A 175M-parameter GPT-2 model fine-tuned on Dostoevsky's digitized works, built on top of ruGPT3-small. Trained for five epochs, it generates Russian prose in a…

COMMERCIAL OK·1K CTX

mGPT 1.3B Mongol

1.3B

A 1.3B-parameter GPT model fine-tuned from ai-forever's mGPT base for Mongolian, with English and Russian also supported. Fine-tuning ran for 50,000 steps on…

COMMERCIAL OK·2K CTX

Vikhr Qwen 2.5 0.5B Instruct

0.5B

A 0.5B Russian-language instruct model fine-tuned from Qwen2.5-0.5B on the GrandMaster-PRO-MAX dataset (~150k instructions). Vikhrmodels claims 4x efficiency…

COMMERCIAL OK·4K CTX

OpenThaiGPT 1.5 7B Instruct

OpenThaiGPT 1.5 7B is a Thai-language chat model fine-tuned from Qwen2.5 on over 2 million Thai instruction pairs. It targets Thai academic benchmarks and…

RESTRICTED·131K CTX

Typhoon S ThaiLLM 8B Instruct Research Preview

An instruction-tuned 8B Thai language model from typhoon-ai, built on ThaiLLM using supervised fine-tuning and on-policy distillation. Training ran on a single…

COMMERCIAL OK·33K CTX

Sarvam 105B FP8

105B

Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages…

COMMERCIAL OK·131K CTX

SmolLM 2 1.7B Instruct

1.7B

SmolLM 2 flagship. Open data + open weights at the edge tier.

COMMERCIAL OK·8K CTX

Nemotron Mini 4B Instruct

NVIDIA's edge-tier Nemotron. Distilled from Minitron lineage with role-play tuning.

COMMERCIAL OK·4K CTX

StarCoder 2 7B

Mid-size StarCoder 2. The 8GB-VRAM autocomplete pick.

COMMERCIAL OK·16K CTX

Whisper Large v3

1.55B

OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.

COMMERCIAL OK·MULTIMODAL

Molmo 72B

72B

Molmo flagship. Apache 2.0 VLM rivaling proprietary models on UI pointing and visual reasoning.

COMMERCIAL OK·MULTIMODAL·4K CTX

LLaVA 1.6 Mistral 7B

LLaVA 1.6 on Mistral 7B base. Apache 2.0 vision-language with strong OCR.

COMMERCIAL OK·MULTIMODAL·33K CTX

StarCoder 2 15B

15B

StarCoder 2 flagship. The largest BigCode coder; 16k context with strong fill-in-middle.

COMMERCIAL OK·16K CTX

StarCoder 2 3B

BigCode's StarCoder 2 at 3B. Trained on The Stack v2 with 600+ programming languages.

COMMERCIAL OK·16K CTX

Aya 23 8B

Cohere's multilingual research model covering 23 languages. CC-BY-NC — research only.

RESTRICTED·8K CTX

Jamba 1.5 Mini

52B

AI21's hybrid Mamba-Transformer MoE. 256k context with the SSM throughput advantage.

COMMERCIAL OK·262K CTX

Tulu 3 70B

70B

Tulu 3 at 70B. AI2's fully-open instruct fine-tune — research transparency at scale.

COMMERCIAL OK·131K CTX

SmolLM 2 360M Instruct

0.36B

Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.

COMMERCIAL OK·8K CTX

BGE M3

0.57B

BAAI's multilingual embedding flagship. Dense + sparse + ColBERT-style multi-vector. The de-facto open multilingual embedding pick.

COMMERCIAL OK·8K CTX

NV-Embed v2

7.85B

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

RESTRICTED·33K CTX

LLaVA-OneVision 7B

LLaVA-OneVision unified single-image / multi-image / video VLM on Qwen 2 base.

COMMERCIAL OK·MULTIMODAL·33K CTX

Whisper Large v3 Turbo

0.81B

Distilled Whisper Large v3. ~8x faster decode at near-equivalent accuracy on most languages.

COMMERCIAL OK·MULTIMODAL

SmolLM 3 3B

HuggingFace's small-model line at 3B. Apache 2.0. Designed for edge / educational deployments.

COMMERCIAL OK·33K CTX

Moondream 2

1.9B

Tiny vision-language model. ~1.9B; designed for edge / embedded multimodal use cases. Apache 2.0.

COMMERCIAL OK·MULTIMODAL·2K CTX

Jamba 1.5 Large

398B

Jamba flagship at 398B total / 94B active. Frontier hybrid-architecture model with 256k context.

COMMERCIAL OK·262K CTX

Nemotron 3 Nano 9B

NVIDIA's Nemotron 3 at 9B. Tuned for NVIDIA-stack deployment patterns; strong tool-calling reliability.

COMMERCIAL OK·131K CTX

Molmo 7B-D

AI2's fully-open VLM. Trained on PixMo dataset; pointing capability for UI grounding.

COMMERCIAL OK·MULTIMODAL·4K CTX

Tulu 3 8B

AI2's fully-open post-training recipe applied to Llama 3.1 8B. Open data, open code, open weights.

COMMERCIAL OK·131K CTX

Stable LM 2 12B

12B

Stability AI's 12B. Stable LM line; commercial use requires paid membership. Solid baseline at 12B class.

RESTRICTED·4K CTX

InternVL 2.5 78B

78B

InternVL 2.5 flagship. Approaches frontier proprietary VLMs on document and OCR tasks.

COMMERCIAL OK·MULTIMODAL·33K CTX

Aya 23 35B

35B

Aya 23 at 35B. Built on Cohere's Command-R lineage. Non-commercial.

RESTRICTED·8K CTX

Nemotron 3 Super 49B

49B

Nemotron 3 mid-tier. 49B dense; fits 32GB cards with AWQ. NVIDIA stack alignment carries through.

COMMERCIAL OK·131K CTX

InternVL 2.5 26B

26B

InternVL 2.5 mid-tier — Shanghai AI Lab vision-language model with strong document and chart understanding.

COMMERCIAL OK·MULTIMODAL·33K CTX