Gemma 3 12B
12B Gemma 3. Fits on 12GB consumer cards. Multimodal.
The 12B sweet-spot member of the Gemma 3 family. Native multimodal in a footprint that fits 12 GB VRAM at Q4 — the only 12B-class model with serious multimodal in 2025.
Strengths- Native vision + text at 12B — uniquely small for a multimodal model.
- 128K context with decent recall.
- Distilled writing quality from Gemini-class data.
- Gemma license restrictiveness applies.
- Multimodal quality lags Pixtral 12B on dense visual reasoning.
- No thinking mode.
- Q4_K_M (7.6 GB): 80–95 tok/s decode, TTFT ~95 ms
- Q5_K_M (8.9 GB): 70–84 tok/s
- Q8_0 (13.4 GB): 50–62 tok/s
Yes, for 12 GB VRAM owners who want multimodal in a single model. No, for text-only workloads where Qwen 2.5 14B is stronger, or vision-priority work where Pixtral 12B is the better pick.
How it compares- vs Pixtral 12B → Pixtral wins on dense visual reasoning; Gemma 3 12B has stronger general text quality.
- vs Qwen 2.5 14B → Qwen wins on text capability; Gemma 3 12B has multimodal as a bonus.
- vs Mistral Nemo 12B → close text-quality call. Gemma adds multimodal; Nemo has cleaner license.
- vs Gemma 3 27B → 27B is meaningfully stronger; 12B is the constrained-VRAM pick.
ollama pull gemma3:12b-it-q4_K_M
ollama run gemma3:12b-it-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 3060 12 GB / 4060 Ti / 4090
›Why this rating
7.9/10 — solid 12B-class entry with native multimodal. Sits between Qwen 2.5 14B and Mistral Nemo 12B in capability; the multimodal feature is the differentiator. Loses points on license restrictiveness.
Overview
12B Gemma 3. Fits on 12GB consumer cards. Multimodal.
Strengths
- Multimodal at 12B
- 128K context
Weaknesses
- Gemma license restrictions
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 7.3 GB | 10 GB |
| Q8_0 | 13.0 GB | 16 GB |
Get the model
Ollama
One-line install
ollama run gemma3:12bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Gemma 3 12B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Gemma 3 12B?
Can I use Gemma 3 12B commercially?
What's the context length of Gemma 3 12B?
How do I install Gemma 3 12B with Ollama?
Does Gemma 3 12B support images?
Source: huggingface.co/google/gemma-3-12b-it
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.