gemma
2B parameters
Commercial OK
Multimodal
Reviewed June 2026

Gemma 4 E2B (Effective 2B)

Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.

License: Gemma Terms of Use·Released Apr 2, 2026·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Gemma 4 E2B (Effective 2B) is the smallest entry in Google's Gemma 4 family, a dense 2-billion-parameter model released under the Gemma Terms of Use. With a 131,072-token context window, it is explicitly designed for edge deployment—phones, Raspberry Pi, and similar low-power hardware. Its compact size and permissive license make it a candidate for on-device applications where privacy and offline capability are priorities.

Strengths

  • Extremely compact footprint: At 2B parameters, the model fits comfortably on consumer hardware. Quantized versions range from ~4 GB (FP16) down to ~0.7 GB (Q2_K), enabling deployment on devices with limited RAM.
  • Long context for an edge model: A 131K token context window is unusually large for a 2B-parameter model, allowing it to process substantial documents or conversation histories on-device.
  • Permissive licensing for commercial use: The Gemma Terms of Use allow broad commercial deployment, making it suitable for integration into products without restrictive licensing.
  • Designed for low-power hardware: Google explicitly targets phones and Raspberry-Pi-class devices, meaning the architecture is optimized for inference on ARM CPUs, mobile GPUs, and other constrained environments.

Limitations

  • Small parameter count limits capability: As a 2B dense model, it will not match the reasoning depth or knowledge breadth of larger models. Operators should expect higher perplexity and narrower competence on complex tasks.
  • No community benchmarks available: We do not yet have independent measurements for this model. Published vendor metrics should be treated as best-case, and real-world performance may vary significantly.
  • KV cache overhead at full context: With 131K context, the KV cache can dominate memory. At FP16, the cache alone may exceed 2 GB, pushing total memory requirements well beyond the model weights. Quantization helps but careful memory budgeting is required.
  • Limited ecosystem maturity: As a new model, tooling (e.g., llama.cpp support, quantization scripts, community fine-tunes) may lag behind more established edge models like Gemma 2 or Phi-3.

What it takes to run this locally

Model file sizes by quantization:

  • FP16: ~4 GB
  • Q8_0: ~2 GB
  • Q6_K: ~1.6 GB
  • Q5_K_M: ~1.4 GB
  • Q4_K_M: ~1.1 GB
  • Q3_K_M: ~1.0 GB
  • Q2_K: ~0.7 GB

Add ~30-50% for KV cache and framework overhead at typical context lengths. For full 131K context, the KV cache alone can be significant—plan for additional memory. Deployment class: edge. A single 4-8 GB GPU or a modern phone SoC (e.g., Apple A-series, Snapdragon 8 Gen) can run quantized versions. Raspberry Pi 4/5 with 4-8 GB RAM can run Q4_K_M or smaller quantizations.

Should you run this locally?

Yes if you need a permissively licensed, small model for on-device inference where privacy, offline capability, and low power consumption are critical. Ideal for mobile apps, IoT, or embedded systems that require long-context understanding without cloud connectivity.

No if your task demands strong reasoning, factual accuracy, or broad knowledge—larger models (e.g., Gemma 4 27B or other 7B+ models) will likely serve better. Also avoid if you need mature community tooling or verified benchmarks; this model is early in its lifecycle.

Catalog cross-links

  • Gemma 4 27B
  • Gemma 2 2B
  • Raspberry Pi 5

Overview

Smallest Gemma 4. Designed for phones and Raspberry-Pi-class hardware.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (gemma-4)

Strengths

  • Phone-class footprint
  • Multimodal

Weaknesses

  • Limited reasoning

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M1.3 GB3 GB
Q8_02.2 GB4 GB

Get the model

Ollama

One-line install

ollama run gemma4:e2bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/google/gemma-4-e2b-it

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record
HardwareProvenanceQuantCtxTokens / secTTFTDate
NVIDIA GeForce RTX 3080 16GB (Mobile)
EditorialM
Q4_K_M4K
99.1tok/s
792 msJun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Gemma 4 E2B (Effective 2B).

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run Gemma 4 E2B (Effective 2B)?

3GB of VRAM is enough to run Gemma 4 E2B (Effective 2B) at the Q4_K_M quantization (file size 1.3 GB). Higher-quality quantizations need more.

Can I use Gemma 4 E2B (Effective 2B) commercially?

Yes — Gemma 4 E2B (Effective 2B) ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of Gemma 4 E2B (Effective 2B)?

Gemma 4 E2B (Effective 2B) supports a context window of 131,072 tokens (about 131K).

How do I install Gemma 4 E2B (Effective 2B) with Ollama?

Run `ollama pull gemma4:e2b` to download, then `ollama run gemma4:e2b` to start a chat session. The default quantization is Q4_K_M.

Does Gemma 4 E2B (Effective 2B) support images?

Yes — Gemma 4 E2B (Effective 2B) is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/google/gemma-4-e2b-it

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Gemma 4 E2B (Effective 2B) runs on your specific hardware before committing money.