SmolLM2 135M Instruct
SmolLM2-135M-Instruct is the smallest instruction-tuned model in Hugging Face's SmolLM2 family, a 135M-parameter Llama-architecture model trained for on-device deployment. It uses an 8K context window and is shipped with ONNX, GGUF, and Transformers.js artifacts for in-browser inference.
The right answer when the constraint is 'must run in a service worker.' SmolLM2-135M is the cleanest open small model we know of for browser and microcontroller-class deployments.
Overview
SmolLM2-135M-Instruct is the smallest instruction-tuned model in Hugging Face's SmolLM2 family, a 135M-parameter Llama-architecture model trained for on-device deployment. It uses an 8K context window and is shipped with ONNX, GGUF, and Transformers.js artifacts for in-browser inference.
Strengths
- Apache-2.0 and fully open: training data, code, and recipe published
- ONNX/Transformers.js artifacts run directly in a browser tab
- Sub-100MB quantized footprint is unmatched for in-RAM inference
- Llama architecture means zero porting work in any runtime
Weaknesses
- Reasoning is genuinely weak — do not use for math or multi-step tasks
- Hallucinates frequently on factual questions
- 8K context is theoretical; quality degrades well before that
- No tool-calling template out of the box
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.1 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of SmolLM2 135M Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run SmolLM2 135M Instruct?
Can I use SmolLM2 135M Instruct commercially?
What's the context length of SmolLM2 135M Instruct?
Source: huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify SmolLM2 135M Instruct runs on your specific hardware before committing money.