gemma
10B parameters
Commercial OK
Multimodal
Reviewed June 2026

PaliGemma 2 10B

Mid-tier PaliGemma 2 fine-tuning base. Better baseline for complex vision tasks.

License: Gemma License·Released Dec 5, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

PaliGemma 2 10B is a dense 10-billion-parameter vision-language model (VLM) released by Google under the Gemma License. With an 8,192-token context window, it is designed as a mid-tier fine-tuning base for complex vision tasks. Its dense architecture means that all 10B parameters are active during inference, making it straightforward to deploy but requiring commensurate compute. The model is positioned as a step up from smaller VLMs for tasks that demand higher visual understanding, while still being accessible on consumer-grade hardware.

Strengths

  • Dense architecture for predictable performance: Unlike mixture-of-experts models, PaliGemma 2 10B uses all parameters for every forward pass, which can simplify deployment and fine-tuning workflows.
  • Consumer-grade VRAM compatibility: With quantized sizes as low as ~3.3 GB (Q2_K) and FP16 at ~20 GB, the model fits on a single 24 GB GPU even at higher precision, making it viable for local fine-tuning.
  • Permissive Gemma License: The license allows for broad commercial and research use, including fine-tuned derivatives, with minimal restrictions.
  • Google-backed ecosystem: As part of the Gemma family, the model benefits from Google's tooling and community support, including integration with popular frameworks like Hugging Face Transformers.

Limitations

  • Limited context window: At 8,192 tokens, the context is shorter than many modern LLMs and VLMs, which may constrain tasks requiring long-form reasoning or high-resolution image analysis.
  • No community benchmarks available: We do not yet have independent, community-reported benchmark results for this model. Operators should treat published vendor metrics as best-case and verify performance on their own tasks.
  • Dense architecture is compute-hungry: All 10B parameters are active, meaning inference and fine-tuning require more FLOPs per token compared to an MoE model with similar total parameters but lower active count.
  • Mid-tier positioning: While capable, this model is not designed to compete with frontier VLMs; it is a practical baseline for fine-tuning rather than a state-of-the-art out-of-the-box solution.

What it takes to run this locally

At FP16, the model requires 20 GB of disk space, plus additional memory for KV cache and framework overhead (typically 30–50% more). For consumer deployment, a single 24 GB GPU (e.g., RTX 4090) can run the model at Q4_K_M (5.6 GB) or Q5_K_M (~7.1 GB) with room for caching. Lower quantizations (Q3_K_M at ~4.9 GB, Q2_K at ~3.3 GB) fit comfortably on 12–16 GB GPUs, though quality trade-offs apply. Fine-tuning at higher precision may require a workstation-class GPU (e.g., 48 GB) or gradient checkpointing.

Should you run this locally?

Yes if you need a capable VLM baseline for fine-tuning on complex vision tasks, and you have access to a consumer GPU with at least 12 GB VRAM. The Gemma License makes it suitable for commercial projects, and the dense architecture simplifies deployment.

No if you require a long-context model (beyond 8K tokens) or need state-of-the-art zero-shot performance without fine-tuning. For very low-resource hardware (e.g., 8 GB GPUs), consider smaller VLMs or heavier quantization.

Catalog cross-links

  • Gemma 2 9B
  • PaliGemma 3B
  • Gemma License overview

Overview

Mid-tier PaliGemma 2 fine-tuning base. Better baseline for complex vision tasks.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
PaliGemma 2 3B3B
Consumer
Family siblings (paligemma-2)
PaliGemma 2 3B3B
Consumer
PaliGemma 2 10B10B
You are here

Strengths

  • Strong fine-tuning base

Weaknesses

  • Same base-only caveat as 3B

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
BF1620.0 GB24 GB

Get the model

HuggingFace

Original weights

huggingface.co/google/paligemma2-10b-pt-224

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of PaliGemma 2 10B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run PaliGemma 2 10B?

24GB of VRAM is enough to run PaliGemma 2 10B at the BF16 quantization (file size 20.0 GB). Higher-quality quantizations need more.

Can I use PaliGemma 2 10B commercially?

Yes — PaliGemma 2 10B ships under the Gemma License, which permits commercial use. Always read the license text before deployment.

What's the context length of PaliGemma 2 10B?

PaliGemma 2 10B supports a context window of 8,192 tokens (about 8K).

Does PaliGemma 2 10B support images?

Yes — PaliGemma 2 10B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/google/paligemma2-10b-pt-224

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify PaliGemma 2 10B runs on your specific hardware before committing money.