other
105B parameters
Commercial OK
Reviewed May 2026

Sarvam 105B

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed and commercially usable.

License: apache-2.0·Context: 128,000 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.3/10

If your use case is genuinely Hindi or another Indian language, this is the most capable open model in that space right now and the Apache 2.0 license makes commercial deployment straightforward. The MoE architecture helps, but you still need the hardware to load 105B weights and the willingness to set up a non-standard inference stack. For general English workloads there are better-supported alternatives at this active-parameter count. Hedge: worth a trial if Indian-language quality is your bottleneck, but verify the inference setup before committing.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicitly Apache 2.0 on the HF card, matching the row. Metadata (105B total, 10.3B active, 128K context via YaRN, MoE architecture) is accurate per the model card. Description and verdict are honest, concrete, and operator-voiced — they correctly flag the VRAM trap (full 105B must load despite low active params) and the non-standard inference stack requirement. Best use case is sharp (Indian-language reasoning/agentic). Weaknesses are appropriately honest about thin community traction. Minor nit: family could arguably be 'sarvam' or noted as MoE, but 'other' is defensible given no established family. Solid pass.

Overview

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed and commercially usable.

Strengths

  • MoE design keeps active params at 10.3B, reducing inference cost relative to total parameter count
  • 128K context window
  • State-of-the-art across 22 Indian languages at this model size per vendor benchmarks
  • Apache 2.0 — no commercial restrictions

Weaknesses

  • Full 105B weights still need to be loaded; VRAM requirements are substantial despite low active params
  • Efficient inference requires a custom vLLM fork or SGLang with specific configs — stock setups may not work
  • Indian-language focus means limited evidence of quality outside that language family
  • Low community traction so far (15K downloads, 269 likes) — real-world reports are thin

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M57.8 GB74 GB

Get the model

HuggingFace

Original weights

huggingface.co/sarvamai/sarvam-105b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Sarvam 105B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run Sarvam 105B?

74GB of VRAM is enough to run Sarvam 105B at the Q4_K_M quantization (file size 57.8 GB). Higher-quality quantizations need more.

Can I use Sarvam 105B commercially?

Yes — Sarvam 105B ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam 105B?

Sarvam 105B supports a context window of 128,000 tokens (about 128K).

Source: huggingface.co/sarvamai/sarvam-105b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Sarvam 105B runs on your specific hardware before committing money.