gemma
3B parameters
Commercial OK
Multimodal
Reviewed June 2026

PaliGemma 2 3B

PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.

License: Gemma License·Released Dec 5, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

PaliGemma 2 3B is a dense vision-language model from Google, built on the Gemma 2 language backbone paired with a SigLIP vision encoder. Released under the Gemma License, it is designed specifically as a base for fine-tuning on task-specific vision tasks. With 3 billion parameters and an 8,192-token context window, it occupies a compact footprint in the VLM space, making it accessible for experimentation and specialized deployment.

Strengths

  • Compact and efficient architecture: As a dense 3B-parameter model, PaliGemma 2 3B requires modest compute and memory, enabling fine-tuning and inference on consumer-grade hardware.
  • Designed for fine-tuning: The combination of Gemma 2 and SigLIP is purpose-built for transfer learning on specific vision tasks, offering a strong starting point for custom applications.
  • Permissive licensing for many use cases: The Gemma License allows for broad commercial and research use, though users should verify terms for their specific deployment.
  • Small quantized sizes enable local deployment: At Q4_K_M, the model is only ~1.7 GB on disk, and even the full FP16 version is ~6 GB, making it feasible to run on a single consumer GPU with adequate VRAM.

Limitations

  • Limited context window: With only 8,192 tokens, the model is not suited for tasks requiring long-form reasoning or processing large documents or images.
  • Small parameter count: At 3B parameters, the model may lack the capacity for complex multi-step reasoning or high-accuracy performance on challenging benchmarks compared to larger VLMs.
  • Task-specific design: PaliGemma 2 is intended as a fine-tuning base, not a general-purpose out-of-the-box VLM. Operators expecting strong zero-shot performance may be disappointed.
  • No community benchmarks available: We do not have verified independent benchmark results for this model. Published vendor metrics should be treated as best-case, and operators should evaluate on their own data.

What it takes to run this locally

PaliGemma 2 3B fits comfortably in the consumer deployment class. Quantized sizes range from ~6 GB (FP16) down to ~1.0 GB (Q2_K). For typical use with a moderate context, add ~30-50% for KV cache and framework overhead. A single GPU with 8-12 GB VRAM (e.g., RTX 3060 or higher) can run the model at Q4_K_M or Q5_K_M quantizations. For FP16, a 12 GB GPU is recommended. No specific tokens-per-second measurements are available.

Should you run this locally?

Yes if you need a compact, fine-tunable VLM for a specific vision task and have access to a consumer GPU with at least 8 GB VRAM. The small size and permissive license make it ideal for prototyping and custom deployment.

No if you require a large context window, out-of-the-box general vision-language capabilities, or state-of-the-art performance without fine-tuning. Consider larger models or those with broader zero-shot abilities.

Catalog cross-links

  • Gemma 2 9B
  • SigLIP
  • Consumer GPU Guide

Overview

PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (paligemma-2)
PaliGemma 2 3B3B
You are here
PaliGemma 2 10B10B
Consumer
Distilled / fine-tuned from this

Strengths

  • Designed for fine-tuning
  • Multiple resolutions

Weaknesses

  • Base — needs task-specific fine-tune to be useful

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
BF166.0 GB8 GB

Get the model

HuggingFace

Original weights

huggingface.co/google/paligemma2-3b-pt-224

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of PaliGemma 2 3B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run PaliGemma 2 3B?

8GB of VRAM is enough to run PaliGemma 2 3B at the BF16 quantization (file size 6.0 GB). Higher-quality quantizations need more.

Can I use PaliGemma 2 3B commercially?

Yes — PaliGemma 2 3B ships under the Gemma License, which permits commercial use. Always read the license text before deployment.

What's the context length of PaliGemma 2 3B?

PaliGemma 2 3B supports a context window of 8,192 tokens (about 8K).

Does PaliGemma 2 3B support images?

Yes — PaliGemma 2 3B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/google/paligemma2-3b-pt-224

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify PaliGemma 2 3B runs on your specific hardware before committing money.