gemma
1B parameters
Commercial OK
Reviewed June 2026

Gemma 3 1B

Smallest text-only Gemma 3 for phones and IoT.

License: Gemma Terms of Use·Released Mar 12, 2025·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Gemma 3 1B is Google's smallest text-only entry in the Gemma 3 family, designed explicitly for edge deployment — phones, IoT devices, and other resource-constrained environments. Released under the Gemma Terms of Use, this dense 1B-parameter model offers a 32,768-token context window, making it one of the most compact open-weight models capable of handling long prompts. Its primary distinction is size: at 1B parameters, it targets scenarios where even a 2B model would be too large or power-hungry.

Strengths

  • Extremely small footprint: With FP16 weighing ~2 GB and quantized versions as low as ~0.3 GB (Q2_K), this model can fit into the memory of a phone or microcontroller-class hardware.
  • Long context for its size: A 32K context window is unusually generous for a 1B-parameter model, enabling tasks like document summarization or multi-turn chat on low-power devices.
  • Permissive commercial terms: The Gemma Terms of Use allow broad commercial use, including fine-tuning and deployment in proprietary products, without royalties.
  • Dense architecture simplicity: Unlike MoE models, this dense 1B has no routing overhead, making inference predictable and easy to optimize for edge runtimes.

Limitations

  • Very limited capacity: 1B parameters inherently restrict the model's knowledge depth, reasoning ability, and instruction-following quality compared to larger models.
  • No multimodal support: This is a text-only model; it cannot process images or other modalities.
  • Edge-only deployment class: Not suitable for server-grade workloads; performance on complex tasks will be noticeably weaker than even mid-size models.
  • No community benchmarks yet: As a new release, independent measurements of real-world performance are not available — vendor claims should be treated as best-case.

What it takes to run this locally

At FP16, the model requires ~2 GB of disk space. Quantized versions reduce this dramatically: Q8_0 ~1 GB, Q4_K_M ~0.6 GB, Q2_K ~0.3 GB. Runtime memory adds ~30–50% for KV cache and framework overhead at typical context lengths. This fits comfortably within a phone's RAM (4–8 GB) or a low-power edge device (e.g., Raspberry Pi with 4+ GB). No GPU is required; CPU inference is practical.

Should you run this locally?

Yes if you need a very small, permissively licensed model for on-device text tasks — chatbots, summarization, classification — where latency and privacy matter more than peak accuracy. No if your use case demands strong reasoning, factual recall, or any multimodal input; you should look at larger Gemma 3 variants or other families.

Catalog cross-links

Overview

Smallest text-only Gemma 3 for phones and IoT.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
Gemma 3 4B4B
Edge

Strengths

  • Phone-class
  • Text-only fast inference

Weaknesses

  • No vision
  • Limited reasoning

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.7 GB2 GB

Get the model

Ollama

One-line install

ollama run gemma3:1bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/google/gemma-3-1b-it

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record
HardwareProvenanceQuantCtxTokens / secTTFTDate
NVIDIA GeForce RTX 3080 16GB (Mobile)
EditorialM
Q4_K_M4K
160.4tok/s
790 msJun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Gemma 3 1B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run Gemma 3 1B?

2GB of VRAM is enough to run Gemma 3 1B at the Q4_K_M quantization (file size 0.7 GB). Higher-quality quantizations need more.

Can I use Gemma 3 1B commercially?

Yes — Gemma 3 1B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of Gemma 3 1B?

Gemma 3 1B supports a context window of 32,768 tokens (about 33K).

How do I install Gemma 3 1B with Ollama?

Run `ollama pull gemma3:1b` to download, then `ollama run gemma3:1b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/google/gemma-3-1b-it

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Gemma 3 1B runs on your specific hardware before committing money.