Gemma 3 12B

Positioning

The 12B sweet-spot member of the Gemma 3 family. Native multimodal in a footprint that fits 12 GB VRAM at Q4 — the only 12B-class model with serious multimodal in 2025.

Strengths

Native vision + text at 12B — uniquely small for a multimodal model.
128K context with decent recall.
Distilled writing quality from Gemini-class data.

Limitations

Gemma license restrictiveness applies.
Multimodal quality lags Pixtral 12B on dense visual reasoning.
No thinking mode.

Real-world performance on RTX 4090

Q4_K_M (7.6 GB): 80–95 tok/s decode, TTFT ~95 ms
Q5_K_M (8.9 GB): 70–84 tok/s
Q8_0 (13.4 GB): 50–62 tok/s

Should you run this locally?

Yes, for 12 GB VRAM owners who want multimodal in a single model. No, for text-only workloads where Qwen 2.5 14B is stronger, or vision-priority work where Pixtral 12B is the better pick.

How it compares

vs Pixtral 12B → Pixtral wins on dense visual reasoning; Gemma 3 12B has stronger general text quality.
vs Qwen 2.5 14B → Qwen wins on text capability; Gemma 3 12B has multimodal as a bonus.
vs Mistral Nemo 12B → close text-quality call. Gemma adds multimodal; Nemo has cleaner license.
vs Gemma 3 27B → 27B is meaningfully stronger; 12B is the constrained-VRAM pick.

Run this yourself

ollama pull gemma3:12b-it-q4_K_M
ollama run gemma3:12b-it-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 3060 12 GB / 4060 Ti / 4090

Quantization	File size	VRAM required
Q4_K_M	7.3 GB	10 GB
Q8_0	13.0 GB	16 GB

Quantization

File size

VRAM required

Q4_K_M

7.3 GB

10 GB

Q8_0

13.0 GB

16 GB

Frequently asked

What's the minimum VRAM to run Gemma 3 12B?

10GB of VRAM is enough to run Gemma 3 12B at the Q4_K_M quantization (file size 7.3 GB). Higher-quality quantizations need more.

Can I use Gemma 3 12B commercially?

Yes — Gemma 3 12B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of Gemma 3 12B?

Gemma 3 12B supports a context window of 131,072 tokens (about 131K).

How do I install Gemma 3 12B with Ollama?

Run `ollama pull gemma3:12b` to download, then `ollama run gemma3:12b` to start a chat session. The default quantization is Q4_K_M.

Does Gemma 3 12B support images?

Yes — Gemma 3 12B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing