CodeGemma 7B

Overview

Coding-specialist Gemma. Decent FIM completion. Now mostly historical with Qwen 2.5 Coder dominating.

Strengths

Fast small coder

Weaknesses

Outpaced by Qwen 2.5 Coder

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	4.2 GB	6 GB

Get the model

Ollama

One-line install

ollama run codegemma:7bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/google/codegemma-7b-it

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record

Hardware	Provenance	Quant	Ctx	Tokens / sec	TTFT	Date
NVIDIA GeForce RTX 3080 16GB (Mobile)	EditorialM	Q4_K_M	4K	80.6tok/s	383 ms	Jun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Submit a benchmark for CodeGemma 7B

OrBrowse the benchmark roadmap Compare hardware options

Hardware that runs this

Cards with enough VRAM for at least one quantization of CodeGemma 7B.

NVIDIA B300 (Blackwell Ultra)

Frequently asked

What's the minimum VRAM to run CodeGemma 7B?

6GB of VRAM is enough to run CodeGemma 7B at the Q4_K_M quantization (file size 4.2 GB). Higher-quality quantizations need more.

Can I use CodeGemma 7B commercially?

Yes — CodeGemma 7B ships under the Gemma Terms of Use, which permits commercial use. Always read the license text before deployment.

What's the context length of CodeGemma 7B?

CodeGemma 7B supports a context window of 8,192 tokens (about 8K).

How do I install CodeGemma 7B with Ollama?

Run `ollama pull codegemma:7b` to download, then `ollama run codegemma:7b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/google/codegemma-7b-it

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Our verdict

Positioning

Strengths

Limitations

Real-world performance on RTX 4090

Should you run this locally?

How it compares

Run this yourself

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Benchmarks

What to do next

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run CodeGemma 7B?

Can I use CodeGemma 7B commercially?

What's the context length of CodeGemma 7B?

How do I install CodeGemma 7B with Ollama?

Related — keep moving