PaliGemma 2 3B
PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.
Positioning
PaliGemma 2 3B is a dense vision-language model from Google, built on the Gemma 2 language backbone paired with a SigLIP vision encoder. Released under the Gemma License, it is designed specifically as a base for fine-tuning on task-specific vision tasks. With 3 billion parameters and an 8,192-token context window, it occupies a compact footprint in the VLM space, making it accessible for experimentation and specialized deployment.
Strengths
- Compact and efficient architecture: As a dense 3B-parameter model, PaliGemma 2 3B requires modest compute and memory, enabling fine-tuning and inference on consumer-grade hardware.
- Designed for fine-tuning: The combination of Gemma 2 and SigLIP is purpose-built for transfer learning on specific vision tasks, offering a strong starting point for custom applications.
- Permissive licensing for many use cases: The Gemma License allows for broad commercial and research use, though users should verify terms for their specific deployment.
- Small quantized sizes enable local deployment: At Q4_K_M, the model is only ~1.7 GB on disk, and even the full FP16 version is ~6 GB, making it feasible to run on a single consumer GPU with adequate VRAM.
Limitations
- Limited context window: With only 8,192 tokens, the model is not suited for tasks requiring long-form reasoning or processing large documents or images.
- Small parameter count: At 3B parameters, the model may lack the capacity for complex multi-step reasoning or high-accuracy performance on challenging benchmarks compared to larger VLMs.
- Task-specific design: PaliGemma 2 is intended as a fine-tuning base, not a general-purpose out-of-the-box VLM. Operators expecting strong zero-shot performance may be disappointed.
- No community benchmarks available: We do not have verified independent benchmark results for this model. Published vendor metrics should be treated as best-case, and operators should evaluate on their own data.
What it takes to run this locally
PaliGemma 2 3B fits comfortably in the consumer deployment class. Quantized sizes range from ~6 GB (FP16) down to ~1.0 GB (Q2_K). For typical use with a moderate context, add ~30-50% for KV cache and framework overhead. A single GPU with 8-12 GB VRAM (e.g., RTX 3060 or higher) can run the model at Q4_K_M or Q5_K_M quantizations. For FP16, a 12 GB GPU is recommended. No specific tokens-per-second measurements are available.
Should you run this locally?
Yes if you need a compact, fine-tunable VLM for a specific vision task and have access to a consumer GPU with at least 8 GB VRAM. The small size and permissive license make it ideal for prototyping and custom deployment.
No if you require a large context window, out-of-the-box general vision-language capabilities, or state-of-the-art performance without fine-tuning. Consider larger models or those with broader zero-shot abilities.
Catalog cross-links
- Gemma 2 9B
- SigLIP
- Consumer GPU Guide
Overview
PaliGemma 2 — Gemma 2 base + SigLIP vision encoder. Designed for fine-tuning on specific vision tasks.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Designed for fine-tuning
- Multiple resolutions
Weaknesses
- Base — needs task-specific fine-tune to be useful
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| BF16 | 6.0 GB | 8 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of PaliGemma 2 3B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run PaliGemma 2 3B?
Can I use PaliGemma 2 3B commercially?
What's the context length of PaliGemma 2 3B?
Does PaliGemma 2 3B support images?
Source: huggingface.co/google/paligemma2-3b-pt-224
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify PaliGemma 2 3B runs on your specific hardware before committing money.