gemma

27B parameters

Restricted

Multimodal

Reviewed June 2026

MedGemma 27B

Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.

License: Gemma Terms of Use (Health AI Developer Foundations)·Released May 20, 2025·Context: 131,072 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026

unrated

Positioning

MedGemma 27B is a medical-domain fine-tune of Google's Gemma 3 27B dense model, released under the Gemma Terms of Use with the Health AI Developer Foundations (HAI-DEF) addendum. Trained on de-identified medical literature and imaging, it targets research applications in healthcare AI. As a dense 27B-parameter model, it offers a straightforward deployment profile without the complexity of mixture-of-experts architectures, but its large context window (131K tokens) and medical specialization make it distinct among open-weight models.

Strengths

Dense architecture with full 27B active parameters: Unlike MoE models, every forward pass uses all parameters, providing predictable compute requirements and no routing overhead.
131K-token context window: Supports long medical documents, patient histories, or multi-page reports without truncation.
Medical-domain fine-tuning: Trained on de-identified medical literature and imaging, making it purpose-built for healthcare research tasks.
Permissive research license: The HAI-DEF terms allow use in health AI research, though commercial deployment requires careful review of restrictions.

Limitations

Large memory footprint: Even at Q4_K_M (~15.2 GB), the model requires a workstation-class GPU (e.g., 24 GB VRAM) to accommodate KV cache overhead (add ~30–50% for typical context lengths).
Research-only license restrictions: The HAI-DEF terms limit use to health AI research and development; commercial or clinical deployment may not be permitted without additional agreements.
No community benchmarks available: We do not have independent measurements of task performance; vendor-reported metrics should be treated as best-case until verified by third parties.
Single-domain specialization: The model's medical focus may limit its utility for general-purpose tasks, and it may underperform on non-medical benchmarks compared to generalist models of similar size.

What it takes to run this locally

At FP16, MedGemma 27B requires 54 GB of disk space and roughly 54 GB of VRAM for inference, placing it firmly in datacenter territory. Quantization reduces the footprint significantly: Q8_0 (29 GB) still demands a dual-GPU workstation (e.g., 2×24 GB), while Q4_K_M (~15.2 GB) fits on a single 24 GB GPU with careful context management. For the full 131K context, expect KV cache overhead of 30–50% on top of model weights. Deployment class is workstation (single 24 GB GPU at Q4_K_M) to datacenter (multi-GPU for FP16 or long contexts).

Should you run this locally?

Yes if you are conducting medical NLP research and need a model pre-fine-tuned on clinical data, and you have access to a workstation with at least 24 GB VRAM (for Q4_K_M) or a multi-GPU setup (for higher precision). The permissive research license simplifies compliance for academic or non-commercial projects.

No if you require commercial or clinical deployment without license restrictions, or if your hardware is limited to consumer GPUs (12–16 GB VRAM), as even the smallest quant (Q2_K ~8.8 GB) may not leave room for the KV cache at longer contexts. Also, if your task is general-purpose, a non-specialized model may be more appropriate.

Catalog cross-links

Gemma 3 27B
Google Gemma family
Workstation deployment guide

Overview

Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.

How to run it

MedGemma 27B is Google's medical-domain fine-tune of Gemma 27B. Specialized for clinical text understanding, medical Q&A, and biomedical reasoning. Run at Q4_K_M via Ollama (ollama pull medgemma:27b) or llama.cpp with -ngl 999 -fa -c 4096. Q4_K_M file size ~15 GB on disk. Minimum VRAM: 12 GB — RTX 4070 (12GB) at Q4_K_M with KV offload for 4K context. RTX 4090 24GB: Q4_K_M comfortably at 16K+ context. Recommended: RTX 4090 24GB at Q4_K_M. Throughput: ~40-60 tok/s on RTX 4090 at Q4_K_M. Gemma architecture — well-supported in llama.cpp, Ollama, and Gemma-specific stacks. MedGemma is Google's medical AI offering — trained on medical literature, clinical notes, and biomedical datasets. Strong on: medical terminology, diagnostic reasoning, drug information, clinical summarization. Weak on: non-medical topics (catastrophic forgetting from domain fine-tuning), current medical guidelines (knowledge cutoff). Not FDA-approved — research model only. License: Gemma license (verify commercial terms). Use for: medical research, clinical decision support (with human review), biomedical literature analysis. Never for: autonomous clinical decisions. Context: Gemma's 8K (practical 4-8K on 12-16 GB). Medical contexts are typically shorter — less KV pressure.

Hardware guidance

Minimum: RTX 3060 12GB at Q3_K_M with KV offload. Recommended: RTX 4090 24GB at Q4_K_M (16K+ context). VRAM math: 27B dense, Q4_K_M ≈ 15 GB. KV cache at 8K: ~6 GB. Total: ~21 GB at 8K. RTX 4090 24GB: Q4 + 8-16K context — comfortable on-GPU. RTX 3080 10GB: Q3_K_M with KV offload. RTX 4080 16GB: Q4 + 4K context on-GPU. MacBook Pro M4 Pro 24GB+: Q4 at 12-25 tok/s. Cloud: A10 24GB at Q4_K_M. MedGemma is one of the largest medical-specific models that fits comfortably on consumer GPUs. AWQ-INT4 drops to ~13 GB. For clinical settings: the model's knowledge is frozen — pair with RAG on current medical literature (PubMed, UpToDate).

What breaks first

Medical hallucination. MedGemma generates plausible-sounding but incorrect medical information — drug dosages, contraindications, treatment protocols. Never use for autonomous clinical decisions. Always verify outputs against current medical guidelines. 2. Catastrophic forgetting. Domain fine-tuning on medical data severely degrades general knowledge. The model will hallucinate on non-medical topics worse than base Gemma 27B. 3. Knowledge cutoff. Medical knowledge is frozen at training time. New drugs, treatment guidelines, and research published after the cutoff aren't known. RAG on current literature is essential for clinical use. 4. Q3 terminology precision. Medical terminology at Q3 introduces dangerous errors — confusing similar drug names, dosages, or anatomical references. Use Q4_K_M minimum for medical tasks. Q8 or FP16 for clinical settings.

Runtime recommendation

Ollama for quick-start. llama.cpp for production. vLLM for serving. Gemma architecture — well-supported. For clinical use: pair with a biomedical RAG pipeline (PubMedBERT embeddings + current literature vectors). MedGemma should augment, not replace, clinical expertise.

Common beginner mistakes

Mistake: Using MedGemma as a diagnostic tool without physician review. Fix: This is a research model, not a medical device. Always have a qualified physician review outputs. The model hallucinates drug names and dosages. Mistake: Trusting MedGemma for current treatment guidelines. Fix: Knowledge cutoff limits the model to pre-training data. Use RAG on UpToDate, PubMed, or current clinical guidelines for up-to-date information. Mistake: Using Q3 quantization for medical tasks. Fix: Q3 confuses medical terminology. Use Q4_K_M minimum. Q8 or FP16 for clinical settings where precision matters. Mistake: Expecting MedGemma to perform well on general knowledge tasks. Fix: Medical fine-tuning causes catastrophic forgetting of non-medical knowledge. Use base Gemma 27B for non-medical tasks.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model

Gemma 3 27B27B

Workstation

Strengths

Medical-domain accuracy

Weaknesses

Not for clinical decisions
License restrictions

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	16.0 GB	20 GB

Get the model

HuggingFace

Original weights

huggingface.co/google/medgemma-27b-it

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of MedGemma 27B.

NVIDIA B300 (Blackwell Ultra)

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run MedGemma 27B?

20GB of VRAM is enough to run MedGemma 27B at the Q4_K_M quantization (file size 16.0 GB). Higher-quality quantizations need more.

Can I use MedGemma 27B commercially?

MedGemma 27B is released under the Gemma Terms of Use (Health AI Developer Foundations), which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of MedGemma 27B?

MedGemma 27B supports a context window of 131,072 tokens (about 131K).

Does MedGemma 27B support images?

Yes — MedGemma 27B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/google/medgemma-27b-it

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify MedGemma 27B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →