CodeGemma 7B
Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.
Positioning
A small, fast coder for the 8 GB VRAM tier. CodeGemma 7B was good when it shipped; today it's the right choice only when license terms or the Gemma toolchain matter, since Qwen 2.5 Coder 7B and DeepSeek Coder V2 Lite both outperform it.
Strengths
- 5 GB at Q4_K_M — runs on 6 GB cards.
- Fast fill-in-the-middle — good for editor autocomplete latency.
- Stable runner support.
Limitations
- Beaten by Qwen 2.5 Coder 7B on capability.
- Gemma license restrictiveness.
- Repo-context handling weaker than newer coders.
Real-world performance on RTX 4090
- Q4_K_M (5.0 GB): 95–115 tok/s decode, TTFT under 70 ms
- Q5_K_M (5.9 GB): 84–100 tok/s
- Q8_0 (8.7 GB): 65–80 tok/s
Should you run this locally?
Yes, for 6–8 GB VRAM coders where Qwen license is a problem. No, for new deployments — pick Qwen 2.5 Coder 7B or DeepSeek Coder V2 Lite.
How it compares
- vs Qwen 2.5 Coder 7B → Qwen wins on capability; CodeGemma has better Apache-flavored license terms (still Gemma-restricted but cleaner than Qwen).
- vs DeepSeek Coder V2 Lite (16B) → DeepSeek is much more capable; CodeGemma wins on VRAM (5 GB vs ~10 GB).
- vs Codestral 22B → Codestral is dramatically more capable; CodeGemma is the constrained-VRAM pick.
Run this yourself
ollama pull codegemma:7b-instruct-q4_K_M
ollama run codegemma:7b-instruct-q4_K_M
Settings: Q4_K_M GGUF, 8192 ctx, llama.cpp/CUDA, RTX 4090
›Why this rating
6.8/10 — Google's coding model in a 7B body. Decent for autocomplete, but Qwen 2.5 Coder 7B exists and is stronger. Loses points for being eclipsed by every modern coder model in similar VRAM.
Overview
Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.
Strengths
- Fast small coder
Weaknesses
- Outpaced by Qwen 2.5 Coder
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.2 GB | 6 GB |
Get the model
Ollama
One-line install
ollama run codegemma:7bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Benchmarks
Real measurements on real hardware. Numbers ship with the runner version, quant, and date.
| Hardware | Provenance | Quant | Ctx | Tokens / sec | TTFT | Date |
|---|---|---|---|---|---|---|
| NVIDIA GeForce RTX 3080 16GB (Mobile) | EditorialM | Q4_K_M | 4K | 80.6tok/s | 383 ms | Jun 2, 26 |
What to do next
Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.
Hardware that runs this
Cards with enough VRAM for at least one quantization of CodeGemma 7B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run CodeGemma 7B?
Can I use CodeGemma 7B commercially?
What's the context length of CodeGemma 7B?
How do I install CodeGemma 7B with Ollama?
Source: huggingface.co/google/codegemma-7b-it
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify CodeGemma 7B runs on your specific hardware before committing money.