Gemma 3 1B
Smallest text-only Gemma 3 for phones and IoT.
Positioning
Gemma 3 1B is Google's smallest text-only entry in the Gemma 3 family, designed explicitly for edge deployment — phones, IoT devices, and other resource-constrained environments. Released under the Gemma Terms of Use, this dense 1B-parameter model offers a 32,768-token context window, making it one of the most compact open-weight models capable of handling long prompts. Its primary distinction is size: at 1B parameters, it targets scenarios where even a 2B model would be too large or power-hungry.
Strengths
- Extremely small footprint: With FP16 weighing ~2 GB and quantized versions as low as ~0.3 GB (Q2_K), this model can fit into the memory of a phone or microcontroller-class hardware.
- Long context for its size: A 32K context window is unusually generous for a 1B-parameter model, enabling tasks like document summarization or multi-turn chat on low-power devices.
- Permissive commercial terms: The Gemma Terms of Use allow broad commercial use, including fine-tuning and deployment in proprietary products, without royalties.
- Dense architecture simplicity: Unlike MoE models, this dense 1B has no routing overhead, making inference predictable and easy to optimize for edge runtimes.
Limitations
- Very limited capacity: 1B parameters inherently restrict the model's knowledge depth, reasoning ability, and instruction-following quality compared to larger models.
- No multimodal support: This is a text-only model; it cannot process images or other modalities.
- Edge-only deployment class: Not suitable for server-grade workloads; performance on complex tasks will be noticeably weaker than even mid-size models.
- No community benchmarks yet: As a new release, independent measurements of real-world performance are not available — vendor claims should be treated as best-case.
What it takes to run this locally
At FP16, the model requires ~2 GB of disk space. Quantized versions reduce this dramatically: Q8_0 ~1 GB, Q4_K_M ~0.6 GB, Q2_K ~0.3 GB. Runtime memory adds ~30–50% for KV cache and framework overhead at typical context lengths. This fits comfortably within a phone's RAM (4–8 GB) or a low-power edge device (e.g., Raspberry Pi with 4+ GB). No GPU is required; CPU inference is practical.
Should you run this locally?
Yes if you need a very small, permissively licensed model for on-device text tasks — chatbots, summarization, classification — where latency and privacy matter more than peak accuracy. No if your use case demands strong reasoning, factual recall, or any multimodal input; you should look at larger Gemma 3 variants or other families.
Catalog cross-links
- Gemma 3 2B
- Gemma 3 12B
- Google Gemma family
Overview
Smallest text-only Gemma 3 for phones and IoT.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Phone-class
- Text-only fast inference
Weaknesses
- No vision
- Limited reasoning
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.7 GB | 2 GB |
Get the model
Ollama
One-line install
ollama run gemma3:1bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Benchmarks
Real measurements on real hardware. Numbers ship with the runner version, quant, and date.
| Hardware | Provenance | Quant | Ctx | Tokens / sec | TTFT | Date |
|---|---|---|---|---|---|---|
| NVIDIA GeForce RTX 3080 16GB (Mobile) | EditorialM | Q4_K_M | 4K | 160.4tok/s | 790 ms | Jun 2, 26 |
What to do next
Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Gemma 3 1B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Gemma 3 1B?
Can I use Gemma 3 1B commercially?
What's the context length of Gemma 3 1B?
How do I install Gemma 3 1B with Ollama?
Source: huggingface.co/google/gemma-3-1b-it
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Gemma 3 1B runs on your specific hardware before committing money.