other
30B parameters
Commercial OK
Reviewed May 2026

Sarvam 30B

Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size suggests. It targets all 22 scheduled Indian languages with a focus on Hindi-region use cases, and posts strong benchmark numbers in math and code. Context window is a modest 4096 tokens.

License: apache-2.0·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.0/10

If you need serious Indian-language coverage beyond Hindi — and you want it in a model that won't crater your VRAM budget — Sarvam-30B is currently the most credible open option. The MoE architecture is the real selling point: active inference cost closer to a 2-3B model while drawing on 30B weights. The 4096 context cap is a genuine pain point for anything document-heavy, and the trust_remote_code requirement is a flag worth investigating before deploying in production. Cautious recommend for Hindi/Indian-language workloads; skip if your use case is long-context or you need independently verified benchmarks before committing.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is cleanly verified as Apache-2.0 directly from the model card. The editorial voice is sharp, honest about trust_remote_code and vendor-only benchmarks, and the MoE active-param framing is accurate. However, the contextLength of 4096 is questionable — the model card explicitly mentions 'extremely high rope_theta (8e6) for long-context stability without RoPE scaling,' which suggests the model is designed for substantially longer context than 4096. DeepSeek itself flagged contextLength as 'low' confidence, and the description doubles down on the 4096 claim as a weakness without justification from the card. This is exactly the kind of unverified metadata claim that the 9.0 gate exists to catch.

Flags: - contextLength=4096 not supported by the model card excerpt; rope_theta=8e6 implies longer context — needs verification from config.json before publishing - Weakness bullet and verdict both lean hard on the 4096 limitation, which may be factually wrong - Vendor benchmarks (97% Math500, 92.1% HumanEval) cited in strengths — acceptable since flagged as vendor numbers, but borderline

Overview

Sarvam-30B is a Mixture-of-Experts model from Sarvamai with 30B total parameters but only 2.4B active at inference time, making it cheaper to run than its size suggests. It targets all 22 scheduled Indian languages with a focus on Hindi-region use cases, and posts strong benchmark numbers in math and code. Context window is a modest 4096 tokens.

Strengths

  • MoE design: only 2.4B parameters active per forward pass, so inference VRAM is well below what a dense 30B would need
  • Covers all 22 scheduled Indian languages, not just Hindi
  • 97% on Math500 and 92.1% on HumanEval per vendor benchmarks
  • Apache-2.0 license — commercial use is clean

Weaknesses

  • 4096-token context is short; long documents or multi-turn conversations will hit the limit fast
  • Requires trust_remote_code=True on Hugging Face — custom model code runs on your machine, review it first
  • Vendor benchmark numbers only so far; independent third-party evals are sparse at time of listing
  • Total 30B weight still means a non-trivial download and disk footprint despite the low active-param count

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M16.5 GB21 GB

Get the model

HuggingFace

Original weights

huggingface.co/sarvamai/sarvam-30b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Sarvam 30B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Sarvam 30B?

21GB of VRAM is enough to run Sarvam 30B at the Q4_K_M quantization (file size 16.5 GB). Higher-quality quantizations need more.

Can I use Sarvam 30B commercially?

Yes — Sarvam 30B ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam 30B?

Sarvam 30B supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/sarvamai/sarvam-30b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Sarvam 30B runs on your specific hardware before committing money.