Sarvam 30B

Sarvam 30B

Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size suggests. It targets all 22 scheduled Indian languages with a focus on Hindi-region use cases, and posts strong benchmark numbers in math and code. Context window is a modest 4096 tokens.

License: apache-2.0·Context: 4,096 tokens

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is cleanly verified as Apache-2.0 directly from the model card. The editorial voice is sharp, honest about trust_remote_code and vendor-only benchmarks, and the MoE active-param framing is accurate. However, the contextLength of 4096 is questionable — the model card explicitly mentions 'extremely high rope_theta (8e6) for long-context stability without RoPE scaling,' which suggests the model is designed for substantially longer context than 4096. DeepSeek itself flagged contextLength as 'low' confidence, and the description doubles down on the 4096 claim as a weakness without justification from the card. This is exactly the kind of unverified metadata claim that the 9.0 gate exists to catch.

Flags: - contextLength=4096 not supported by the model card excerpt; rope_theta=8e6 implies longer context — needs verification from config.json before publishing - Weakness bullet and verdict both lean hard on the 4096 limitation, which may be factually wrong - Vendor benchmarks (97% Math500, 92.1% HumanEval) cited in strengths — acceptable since flagged as vendor numbers, but borderline

Overview

Quantization	File size	VRAM required
Q4_K_M	16.5 GB	21 GB

Quantization

File size

VRAM required

Q4_K_M

16.5 GB

21 GB

Frequently asked

What's the minimum VRAM to run Sarvam 30B?

21GB of VRAM is enough to run Sarvam 30B at the Q4_K_M quantization (file size 16.5 GB). Higher-quality quantizations need more.

Can I use Sarvam 30B commercially?

Yes — Sarvam 30B ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam 30B?

Sarvam 30B supports a context window of 4,096 tokens (about 4K).

Our verdict

Overview

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Sarvam 30B?

Can I use Sarvam 30B commercially?

What's the context length of Sarvam 30B?

Related — keep moving