MedGemma 27B
Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.
Positioning
MedGemma 27B is a medical-domain fine-tune of Google's Gemma 3 27B dense model, released under the Gemma Terms of Use with the Health AI Developer Foundations (HAI-DEF) addendum. Trained on de-identified medical literature and imaging, it targets research applications in healthcare AI. As a dense 27B-parameter model, it offers a straightforward deployment profile without the complexity of mixture-of-experts architectures, but its large context window (131K tokens) and medical specialization make it distinct among open-weight models.
Strengths
- Dense architecture with full 27B active parameters: Unlike MoE models, every forward pass uses all parameters, providing predictable compute requirements and no routing overhead.
- 131K-token context window: Supports long medical documents, patient histories, or multi-page reports without truncation.
- Medical-domain fine-tuning: Trained on de-identified medical literature and imaging, making it purpose-built for healthcare research tasks.
- Permissive research license: The HAI-DEF terms allow use in health AI research, though commercial deployment requires careful review of restrictions.
Limitations
- Large memory footprint: Even at Q4_K_M (~15.2 GB), the model requires a workstation-class GPU (e.g., 24 GB VRAM) to accommodate KV cache overhead (add ~30–50% for typical context lengths).
- Research-only license restrictions: The HAI-DEF terms limit use to health AI research and development; commercial or clinical deployment may not be permitted without additional agreements.
- No community benchmarks available: We do not have independent measurements of task performance; vendor-reported metrics should be treated as best-case until verified by third parties.
- Single-domain specialization: The model's medical focus may limit its utility for general-purpose tasks, and it may underperform on non-medical benchmarks compared to generalist models of similar size.
What it takes to run this locally
At FP16, MedGemma 27B requires 54 GB of disk space and roughly 54 GB of VRAM for inference, placing it firmly in datacenter territory. Quantization reduces the footprint significantly: Q8_0 (29 GB) still demands a dual-GPU workstation (e.g., 2×24 GB), while Q4_K_M (~15.2 GB) fits on a single 24 GB GPU with careful context management. For the full 131K context, expect KV cache overhead of 30–50% on top of model weights. Deployment class is workstation (single 24 GB GPU at Q4_K_M) to datacenter (multi-GPU for FP16 or long contexts).
Should you run this locally?
Yes if you are conducting medical NLP research and need a model pre-fine-tuned on clinical data, and you have access to a workstation with at least 24 GB VRAM (for Q4_K_M) or a multi-GPU setup (for higher precision). The permissive research license simplifies compliance for academic or non-commercial projects.
No if you require commercial or clinical deployment without license restrictions, or if your hardware is limited to consumer GPUs (12–16 GB VRAM), as even the smallest quant (Q2_K ~8.8 GB) may not leave room for the KV cache at longer contexts. Also, if your task is general-purpose, a non-specialized model may be more appropriate.
Catalog cross-links
- Gemma 3 27B
- Google Gemma family
- Workstation deployment guide
Overview
Medical-specialist Gemma fine-tune. Trained on de-identified medical literature and imaging. Research use under HAI-DEF terms.
How to run it
MedGemma 27B is Google's medical-domain fine-tune of Gemma 27B. Specialized for clinical text understanding, medical Q&A, and biomedical reasoning. Run at Q4_K_M via Ollama (ollama pull medgemma:27b) or llama.cpp with -ngl 999 -fa -c 4096. Q4_K_M file size ~15 GB on disk. Minimum VRAM: 12 GB — RTX 4070 (12GB) at Q4_K_M with KV offload for 4K context. RTX 4090 24GB: Q4_K_M comfortably at 16K+ context. Recommended: RTX 4090 24GB at Q4_K_M. Throughput: ~40-60 tok/s on RTX 4090 at Q4_K_M. Gemma architecture — well-supported in llama.cpp, Ollama, and Gemma-specific stacks. MedGemma is Google's medical AI offering — trained on medical literature, clinical notes, and biomedical datasets. Strong on: medical terminology, diagnostic reasoning, drug information, clinical summarization. Weak on: non-medical topics (catastrophic forgetting from domain fine-tuning), current medical guidelines (knowledge cutoff). Not FDA-approved — research model only. License: Gemma license (verify commercial terms). Use for: medical research, clinical decision support (with human review), biomedical literature analysis. Never for: autonomous clinical decisions. Context: Gemma's 8K (practical 4-8K on 12-16 GB). Medical contexts are typically shorter — less KV pressure.
Hardware guidance
Minimum: RTX 3060 12GB at Q3_K_M with KV offload. Recommended: RTX 4090 24GB at Q4_K_M (16K+ context). VRAM math: 27B dense, Q4_K_M ≈ 15 GB. KV cache at 8K: ~6 GB. Total: ~21 GB at 8K. RTX 4090 24GB: Q4 + 8-16K context — comfortable on-GPU. RTX 3080 10GB: Q3_K_M with KV offload. RTX 4080 16GB: Q4 + 4K context on-GPU. MacBook Pro M4 Pro 24GB+: Q4 at 12-25 tok/s. Cloud: A10 24GB at Q4_K_M. MedGemma is one of the largest medical-specific models that fits comfortably on consumer GPUs. AWQ-INT4 drops to ~13 GB. For clinical settings: the model's knowledge is frozen — pair with RAG on current medical literature (PubMed, UpToDate).
What breaks first
- Medical hallucination. MedGemma generates plausible-sounding but incorrect medical information — drug dosages, contraindications, treatment protocols. Never use for autonomous clinical decisions. Always verify outputs against current medical guidelines. 2. Catastrophic forgetting. Domain fine-tuning on medical data severely degrades general knowledge. The model will hallucinate on non-medical topics worse than base Gemma 27B. 3. Knowledge cutoff. Medical knowledge is frozen at training time. New drugs, treatment guidelines, and research published after the cutoff aren't known. RAG on current literature is essential for clinical use. 4. Q3 terminology precision. Medical terminology at Q3 introduces dangerous errors — confusing similar drug names, dosages, or anatomical references. Use Q4_K_M minimum for medical tasks. Q8 or FP16 for clinical settings.
Runtime recommendation
Common beginner mistakes
Mistake: Using MedGemma as a diagnostic tool without physician review. Fix: This is a research model, not a medical device. Always have a qualified physician review outputs. The model hallucinates drug names and dosages. Mistake: Trusting MedGemma for current treatment guidelines. Fix: Knowledge cutoff limits the model to pre-training data. Use RAG on UpToDate, PubMed, or current clinical guidelines for up-to-date information. Mistake: Using Q3 quantization for medical tasks. Fix: Q3 confuses medical terminology. Use Q4_K_M minimum. Q8 or FP16 for clinical settings where precision matters. Mistake: Expecting MedGemma to perform well on general knowledge tasks. Fix: Medical fine-tuning causes catastrophic forgetting of non-medical knowledge. Use base Gemma 27B for non-medical tasks.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Medical-domain accuracy
Weaknesses
- Not for clinical decisions
- License restrictions
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 16.0 GB | 20 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of MedGemma 27B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run MedGemma 27B?
Can I use MedGemma 27B commercially?
What's the context length of MedGemma 27B?
Does MedGemma 27B support images?
Source: huggingface.co/google/medgemma-27b-it
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify MedGemma 27B runs on your specific hardware before committing money.