SmolLM 2 360M Instruct
Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.
Positioning
SmolLM 2 360M Instruct is a compact, dense language model released by Hugging Face under the permissive Apache 2.0 license. With only 360 million parameters and an 8,192-token context window, it is explicitly designed for edge deployment—targeting devices like phones and Raspberry Pi-class hardware. Its small footprint makes it one of the most accessible open-weight models for on-device inference, prioritizing speed and low resource usage over raw capability.
Strengths
- Tiny footprint, huge portability: At FP16 the model is approximately 1 GB on disk, and quantized versions (e.g., Q4_K_M at ~0.2 GB) can fit on even the most resource-constrained devices, including mobile phones and single-board computers.
- Permissive Apache 2.0 license: No restrictions on commercial use, modification, or redistribution—ideal for integrating into proprietary applications or research projects without legal friction.
- Designed for edge inference: The model's architecture and size are purpose-built for low-latency, offline operation on consumer edge hardware, making it a strong candidate for privacy-sensitive or connectivity-limited scenarios.
- Low memory overhead: With a context length of 8,192 tokens and small quant sizes, the total memory requirement (model + KV cache + framework overhead) typically stays under 1 GB, enabling deployment on devices with as little as 1–2 GB of RAM.
Limitations
- Limited capacity for complex tasks: With only 360M parameters, the model lacks the depth and breadth of larger models for nuanced reasoning, multi-step instruction following, or domain-specific knowledge. It is best suited for simple, single-turn interactions.
- Short context window: The 8,192-token context may be insufficient for tasks requiring long document analysis, extended conversations, or large-context retrieval-augmented generation.
- No community benchmark data available: We do not have verified, independent measurements of this model's performance on standard NLP benchmarks. Published vendor metrics should be treated as best-case estimates.
- Not suitable for high-stakes or specialized domains: The model's small size and general training data mean it may produce less reliable outputs for technical, medical, legal, or other expert fields without fine-tuning.
What it takes to run this locally
SmolLM 2 360M Instruct is an edge-class model. At FP16 it occupies ~1 GB of disk space; quantized versions range from ~0.3 GB (Q6_K, Q5_K_M) down to ~0.1 GB (Q2_K). Adding ~30–50% for KV cache and framework overhead at typical context lengths keeps total memory well under 2 GB. This means the model can run on a wide variety of consumer hardware, including mobile phones, Raspberry Pi 4/5, low-power laptops, and even some microcontrollers with sufficient RAM. No dedicated GPU is required—CPU inference is practical.
Should you run this locally?
Yes if you need a lightweight, permissively licensed model for on-device chat or simple text generation on phones, single-board computers, or low-resource environments where privacy and offline capability are priorities.
No if your use case demands complex reasoning, long-context understanding, or high accuracy on specialized tasks—larger models or cloud APIs will be more appropriate.
Catalog cross-links
- SmolLM 2 1.7B Instruct
- Llama 3.2 1B Instruct
- Qwen2.5 0.5B Instruct
Overview
Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Apache 2.0
- Phone-deployable
Weaknesses
- Trivial reasoning only
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.3 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of SmolLM 2 360M Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run SmolLM 2 360M Instruct?
Can I use SmolLM 2 360M Instruct commercially?
What's the context length of SmolLM 2 360M Instruct?
Source: huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify SmolLM 2 360M Instruct runs on your specific hardware before committing money.