Small Language Models

Small language models are how local AI actually lands on phones, dev laptops without dGPUs, and battery-constrained devices. A 1B or 3B model that runs at 30 tok/s on a Pixel 9 beats a 70B model that takes 8 minutes to load on the same machine — operator math, not vendor math.

The frontier has shifted: Qwen 3 0.6B clears 20M HuggingFace downloads, Gemma 3 270M went viral on r/LocalLLaMA, and HuggingFace's own SmolLM2 family hits production speech assistants. This hub catalogs the ≤3.5B tier with the same depth of editorial we give the flagship 70B models — license trap, context ceiling, missing GGUF, all called out.

Each row links to /models/[slug] for the full operator notes: tested hardware, recommended quantization, and the prompting kit that worked in our test runs. Where we've benchmarked, the score sits next to the model. Where we haven't, the row says so — no fake numbers.

Other / from-scratch

Qwen-based

Gemma-based

Llama-based

Granite-based

Mistral-based

EXAONE-based

StepFun-based

OLMo-based

Falcon-based

rwkv

DeepSeek-based

hermes

dolphin

Don't see a model you'd run on your phone?