gemma

27B parameters

Commercial OK

Multimodal

Reviewed June 2026

Gemma 3 27B

Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.

License: Gemma Terms of Use·Released Mar 12, 2025·Context: 131,072 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026

8.2/10

Positioning

Gemma 3 27B is Google's flagship open-weight in 2025 — natively multimodal, 128K context, distilled from Gemini-class data. Right pick when you want Google's training distribution + multimodal in a single model that fits on 24 GB VRAM.

Strengths

Native vision-language — single model, no separate adapter.
128K context with reasonable recall — better than Llama 3.1 8B's nominal 128K.
Distillation from Gemini shows in writing quality and instruction polish.

Limitations

Gemma license is restrictive — terms more limiting than Apache or Llama; review for commercial use.
Slightly weaker on hard reasoning than Qwen 3 32B at similar VRAM.
No thinking-mode equivalent — single dense mode.

Real-world performance on RTX 4090

Q4_K_M (16.5 GB): 60–75 tok/s decode, TTFT ~130 ms — full GPU
Q5_K_M (19.4 GB): 50–62 tok/s
Q8_0 (29 GB): partial offload, 18–26 tok/s

Should you run this locally?

Yes, for users who want native multimodal + Google's training distribution + 24 GB single-card runtime. No, for users sensitive to license terms (Apache options exist) or who prioritize raw reasoning ceiling (Qwen 3 32B).

How it compares

vs Qwen 3 32B → Qwen wins on reasoning + license; Gemma wins on multimodality + writing polish. Pick by job.
vs Mistral Small 3 24B → Mistral wins on license simplicity; Gemma wins on multimodality.
vs Gemma 3 12B → 27B is materially smarter; pick 27B if VRAM allows.
vs Llama 3.3 70B → Llama 3.3 70B is smarter but ~3× slower on a 4090; Gemma 3 27B is the productivity pick at this VRAM.

Run this yourself

ollama pull gemma3:27b-it-q4_K_M
ollama run gemma3:27b-it-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090

›Why this rating

8.2/10 — Google's 27B is a credible alternative in the dense mid-tier with native multimodal and a 128K context. Loses points to Qwen 3 32B (slightly smaller, slightly stronger) and Mistral Small 3 24B (cleaner license).

Overview

Pre-Gemma-4 flagship. Multimodal (4B+ variants), 128K context, 140 languages. Strong daily driver on 24GB cards.

Execution notes

L1.25 enriched

Operator notes

Gemma 3 27B is Google's open-weight workstation flagship through 2025-mid-2026 (succeeded by Gemma 4 31B in March 2026 but still widely deployed in stable production environments). Strong multilingual; the 'it' instruct variant supports vision input — making this one of the better consumer-tier multimodal models without dedicated vision-language model overhead.

The 27B size class is awkward — bigger than 24B competitors (Mistral Small, Devstral) where Gemma's training depth shows, smaller than the 32B class where Qwen 2.5 / 3 dominate. The right operating point: when you specifically want Google's training methodology + Gemma's multilingual breadth + optional vision support.

Deployment notes

Fits 22GB VRAM at Q4_K_M with 32K context — RTX 4090 / 5090 / M3 Max comfortable. Workstation-tier deployment via Ollama or vLLM depending on multi-user concurrency needs.

For the smaller / cheaper Gemma path, Gemma 3 12B is the consumer-tier sibling. Gemma 3 4B is the Apple-Silicon-edge tier.

For the Gemma successor: Gemma 4 31B — drop-in upgrade with improved reasoning at the same hardware envelope. New deployments default to Gemma 4 in May 2026; Gemma 3 27B remains in production for teams not yet migrated.

Runtime compatibility

Ollama ✓ excellent. Q4_K_M GGUF + native support; canonical first-pull path.
vLLM ✓ excellent. AWQ-INT4 supported; production multi-user serving.
MLX-LM ✓ good. Apple Silicon path; the M3 Max 64GB tier handles this comfortably.
llama.cpp ✓ excellent. Native GGUF support.
TensorRT-LLM ✓ partial. Compiles but multimodal vision support is younger than vLLM's.

Multimodal capability

The Gemma 3 27B 'it' (instruct) variant accepts image inputs — Google added vision support in the 2025 refresh. Image-text reasoning is competitive with Pixtral 12B for most workflows; document Q&A is solid. NOT competitive with Llama 4 Scout or Qwen 2.5-VL 72B on the hardest visual reasoning tasks.

Best use cases

Multilingual chat with European + Indic + Southeast Asian language coverage — Gemma's training corpus emphasizes language breadth.
Lightweight multimodal — image-text tasks where dedicated VLMs are overkill.
Workstation-tier general serving — when you specifically want Google's training methodology in self-hosted form.
Educational / academic deployments — Gemma Terms license is permissive enough for most academic uses.

When to use a different model

Latest Gemma: Gemma 4 31B — drop-in upgrade.
Apache 2.0 license required: Qwen 2.5 32B or Qwen 3 32B — clean Apache 2.0 vs Gemma Terms.
Coding-first: Qwen 2.5 Coder 32B — Gemma 3 isn't coding-specialized.
Reasoning-first: Qwen 3 32B (toggle) or DeepSeek R1 Distill Qwen 32B (always-on).
Frontier-tier multimodal: Llama 4 Scout or Qwen 2.5-VL 72B.
Consumer / 16GB tier: Gemma 3 12B.

Failure modes specific to this model

License friction. Gemma Terms is permissive but not Apache 2.0; some legal-review processes flag it. Verify before commercial deployment.
Vision support requires the 'it' variant. The base Gemma 3 27B (non-it) is text-only. Pull the 'it' tag explicitly for multimodal.
Older context window (32K). Newer Qwen 3 and Mistral Small 3.2 ship 128K+ context — Gemma 3 27B trails on long-context workloads.

Going deeper

Gemma 4 31B — the successor
Gemma 3 12B, Gemma 3 4B, Gemma 3 1B — family siblings
Pixtral 12B — the dedicated multimodal alternative at consumer scale
/stacks/rtx-4090-workstation — workstation deployment context

Reviewed May 6, 2026 by Fredoline Eruo

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (gemma-3)

Trendyol LLM Asure 12B11.8B

Consumer

Gemma 3 12B12B

Consumer

Gemma 3 27B27B

You are here

Distilled / fine-tuned from this

Strengths

Multimodal
Multilingual
128K context

Weaknesses

Superseded by Gemma 4

Prompting kit

From model card

source

Tested patterns for getting the most out of Gemma 3 27B locally. Local models are pickier about prompt structure than cloud models — what works on Claude or GPT-5 often fails here.

Recommended system prompt

You are a careful and helpful assistant. Answer the user's question directly. When the user provides an image, describe what you see before reasoning about it.

Quirks to know

•Gemma's chat template has no native system role. To inject system instructions, prepend them to the first user turn — the model card and tokenizer_config.json both document this convention.
•Multimodal: Gemma 3 27B accepts images. Per the model card, the vision encoder handles up to 896×896 px inputs natively; higher-resolution images are auto-resized.
•128K context window per the model card. Per Google's technical report, retention quality stays high through 64K and degrades noticeably past 100K.
•Multilingual: officially supports 140+ languages with reasonable quality on ~35 of them. The model card lists the priority tier.
•Gemma models tend to under-refuse on creative writing and over-refuse on technical security topics. Anchor the system prompt to specify the work mode (helpful technical, helpful creative, etc.).

Chat template

Gemma 3

Uses <start_of_turn>{role}\n{content}<end_of_turn>. Apply via the runtime's tokenizer_config.json — Gemma's template injects an initial <bos> token that hand-rolled templates often miss.

Tool calling

✓ Supported(prompted-convention)

Function calling is supported via prompting convention rather than a dedicated chat-template role. Per the model card, declare tools in the system-injected first user turn as JSON schemas; the model emits ```tool_code blocks for the call.