Sarvam 105B

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed and commercially usable.

License: apache-2.0·Context: 128,000 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026

9.3/10

If your use case is genuinely Hindi or another Indian language, this is the most capable open model in that space right now and the Apache 2.0 license makes commercial deployment straightforward. The MoE architecture helps, but you still need the hardware to load 105B weights and the willingness to set up a non-standard inference stack. For general English workloads there are better-supported alternatives at this active-parameter count. Hedge: worth a trial if Indian-language quality is your bottleneck, but verify the inference setup before committing.

›Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicitly Apache 2.0 on the HF card, matching the row. Metadata (105B total, 10.3B active, 128K context via YaRN, MoE architecture) is accurate per the model card. Description and verdict are honest, concrete, and operator-voiced — they correctly flag the VRAM trap (full 105B must load despite low active params) and the non-standard inference stack requirement. Best use case is sharp (Indian-language reasoning/agentic). Weaknesses are appropriately honest about thin community traction. Minor nit: family could arguably be 'sarvam' or noted as MoE, but 'other' is defensible given no established family. Solid pass.

Overview

Strengths

MoE design keeps active params at 10.3B, reducing inference cost relative to total parameter count
128K context window
State-of-the-art across 22 Indian languages at this model size per vendor benchmarks
Apache 2.0 — no commercial restrictions

Weaknesses

Full 105B weights still need to be loaded; VRAM requirements are substantial despite low active params
Efficient inference requires a custom vLLM fork or SGLang with specific configs — stock setups may not work
Indian-language focus means limited evidence of quality outside that language family
Low community traction so far (15K downloads, 269 likes) — real-world reports are thin