Sarvam 30B
Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size suggests. It targets all 22 scheduled Indian languages with a focus on Hindi-region use cases, and posts strong benchmark numbers in math and code. Context window is a modest 4096 tokens.
If you need serious Indian-language coverage beyond Hindi — and you want it in a model that won't crater your VRAM budget — Sarvam-30B is currently the most credible open option. The MoE architecture is the real selling point: active inference cost closer to a 2-3B model while drawing on 30B weights. The 4096 context cap is a genuine pain point for anything document-heavy, and the trust_remote_code requirement is a flag worth investigating before deploying in production. Cautious recommend for Hindi/Indian-language workloads; skip if your use case is long-context or you need independently verified benchmarks before committing.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is cleanly verified as Apache-2.0 directly from the model card. The editorial voice is sharp, honest about trust_remote_code and vendor-only benchmarks, and the MoE active-param framing is accurate. However, the contextLength of 4096 is questionable — the model card explicitly mentions 'extremely high rope_theta (8e6) for long-context stability without RoPE scaling,' which suggests the model is designed for substantially longer context than 4096. DeepSeek itself flagged contextLength as 'low' confidence, and the description doubles down on the 4096 claim as a weakness without justification from the card. This is exactly the kind of unverified metadata claim that the 9.0 gate exists to catch.
Flags: - contextLength=4096 not supported by the model card excerpt; rope_theta=8e6 implies longer context — needs verification from config.json before publishing - Weakness bullet and verdict both lean hard on the 4096 limitation, which may be factually wrong - Vendor benchmarks (97% Math500, 92.1% HumanEval) cited in strengths — acceptable since flagged as vendor numbers, but borderline
Overview
Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size suggests. It targets all 22 scheduled Indian languages with a focus on Hindi-region use cases, and posts strong benchmark numbers in math and code. Context window is a modest 4096 tokens.
Strengths
- MoE design: only 2.4B parameters active per forward pass, so inference VRAM is well below what a dense 30B would need
- Covers all 22 scheduled Indian languages, not just Hindi
- 97% on Math500 and 92.1% on HumanEval per vendor benchmarks
- Apache-2.0 license — commercial use is clean
Weaknesses
- 4096-token context is short; long documents or multi-turn conversations will hit the limit fast
- Requires trust_remote_code=True on Hugging Face — custom model code runs on your machine, review it first
- Vendor benchmark numbers only so far; independent third-party evals are sparse at time of listing
- Total 30B weight still means a non-trivial download and disk footprint despite the low active-param count
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 16.5 GB | 21 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Sarvam 30B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Sarvam 30B?
Can I use Sarvam 30B commercially?
What's the context length of Sarvam 30B?
Source: huggingface.co/sarvamai/sarvam-30b
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Sarvam 30B runs on your specific hardware before committing money.