other
0.36B parameters
Commercial OK
Reviewed June 2026

SmolLM 2 360M Instruct

Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.

License: Apache 2.0·Released Nov 1, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

SmolLM 2 360M Instruct is a compact, dense language model released by Hugging Face under the permissive Apache 2.0 license. With only 360 million parameters and an 8,192-token context window, it is explicitly designed for edge deployment—targeting devices like phones and Raspberry Pi-class hardware. Its small footprint makes it one of the most accessible open-weight models for on-device inference, prioritizing speed and low resource usage over raw capability.

Strengths

  • Tiny footprint, huge portability: At FP16 the model is approximately 1 GB on disk, and quantized versions (e.g., Q4_K_M at ~0.2 GB) can fit on even the most resource-constrained devices, including mobile phones and single-board computers.
  • Permissive Apache 2.0 license: No restrictions on commercial use, modification, or redistribution—ideal for integrating into proprietary applications or research projects without legal friction.
  • Designed for edge inference: The model's architecture and size are purpose-built for low-latency, offline operation on consumer edge hardware, making it a strong candidate for privacy-sensitive or connectivity-limited scenarios.
  • Low memory overhead: With a context length of 8,192 tokens and small quant sizes, the total memory requirement (model + KV cache + framework overhead) typically stays under 1 GB, enabling deployment on devices with as little as 1–2 GB of RAM.

Limitations

  • Limited capacity for complex tasks: With only 360M parameters, the model lacks the depth and breadth of larger models for nuanced reasoning, multi-step instruction following, or domain-specific knowledge. It is best suited for simple, single-turn interactions.
  • Short context window: The 8,192-token context may be insufficient for tasks requiring long document analysis, extended conversations, or large-context retrieval-augmented generation.
  • No community benchmark data available: We do not have verified, independent measurements of this model's performance on standard NLP benchmarks. Published vendor metrics should be treated as best-case estimates.
  • Not suitable for high-stakes or specialized domains: The model's small size and general training data mean it may produce less reliable outputs for technical, medical, legal, or other expert fields without fine-tuning.

What it takes to run this locally

SmolLM 2 360M Instruct is an edge-class model. At FP16 it occupies ~1 GB of disk space; quantized versions range from ~0.3 GB (Q6_K, Q5_K_M) down to ~0.1 GB (Q2_K). Adding ~30–50% for KV cache and framework overhead at typical context lengths keeps total memory well under 2 GB. This means the model can run on a wide variety of consumer hardware, including mobile phones, Raspberry Pi 4/5, low-power laptops, and even some microcontrollers with sufficient RAM. No dedicated GPU is required—CPU inference is practical.

Should you run this locally?

Yes if you need a lightweight, permissively licensed model for on-device chat or simple text generation on phones, single-board computers, or low-resource environments where privacy and offline capability are priorities.

No if your use case demands complex reasoning, long-context understanding, or high accuracy on specialized tasks—larger models or cloud APIs will be more appropriate.

Catalog cross-links

Overview

Hugging Face's SmolLM 2 at 360M. Apache 2.0; targets phone / Pi-class deployments.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (smollm-2)
SmolLM 2 360M Instruct0.36B
You are here
SmolLM 2 1.7B Instruct1.7B
Edge
Distilled / fine-tuned from this

Strengths

  • Apache 2.0
  • Phone-deployable

Weaknesses

  • Trivial reasoning only

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.3 GB1 GB

Get the model

HuggingFace

Original weights

huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of SmolLM 2 360M Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run SmolLM 2 360M Instruct?

1GB of VRAM is enough to run SmolLM 2 360M Instruct at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use SmolLM 2 360M Instruct commercially?

Yes — SmolLM 2 360M Instruct ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of SmolLM 2 360M Instruct?

SmolLM 2 360M Instruct supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify SmolLM 2 360M Instruct runs on your specific hardware before committing money.