Sarvam 105B
Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed and commercially usable.
If your use case is genuinely Hindi or another Indian language, this is the most capable open model in that space right now and the Apache 2.0 license makes commercial deployment straightforward. The MoE architecture helps, but you still need the hardware to load 105B weights and the willingness to set up a non-standard inference stack. For general English workloads there are better-supported alternatives at this active-parameter count. Hedge: worth a trial if Indian-language quality is your bottleneck, but verify the inference setup before committing.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicitly Apache 2.0 on the HF card, matching the row. Metadata (105B total, 10.3B active, 128K context via YaRN, MoE architecture) is accurate per the model card. Description and verdict are honest, concrete, and operator-voiced — they correctly flag the VRAM trap (full 105B must load despite low active params) and the non-standard inference stack requirement. Best use case is sharp (Indian-language reasoning/agentic). Weaknesses are appropriately honest about thin community traction. Minor nit: family could arguably be 'sarvam' or noted as MoE, but 'other' is defensible given no established family. Solid pass.
Overview
Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed and commercially usable.
Strengths
- MoE design keeps active params at 10.3B, reducing inference cost relative to total parameter count
- 128K context window
- State-of-the-art across 22 Indian languages at this model size per vendor benchmarks
- Apache 2.0 — no commercial restrictions
Weaknesses
- Full 105B weights still need to be loaded; VRAM requirements are substantial despite low active params
- Efficient inference requires a custom vLLM fork or SGLang with specific configs — stock setups may not work
- Indian-language focus means limited evidence of quality outside that language family
- Low community traction so far (15K downloads, 269 likes) — real-world reports are thin
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 57.8 GB | 74 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Sarvam 105B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Sarvam 105B?
Can I use Sarvam 105B commercially?
What's the context length of Sarvam 105B?
Source: huggingface.co/sarvamai/sarvam-105b
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Sarvam 105B runs on your specific hardware before committing money.