Sarvam M
Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.
If you need solid Hindi (or broader Indic language) coverage and have the VRAM to run a 24B model, Sarvam M is a credible option — the benchmark improvements over its base are real and the Apache-2.0 license keeps commercial use straightforward. The 4096-token context is a genuine constraint, so rule it out if your workload involves long documents. For pure Hindi chat without heavy reasoning, a smaller fine-tuned model may stretch your hardware further. Hedge: worth testing if Indic language quality is your priority, but verify the context limit won't bottleneck your use case before committing.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is explicitly apache-2.0 on the card and correctly flagged commercial-OK. Params (24B), family (mistral), and vendor (Sarvam AI) are accurate and verifiable. The 4096 context claim is the weakest metadata point — Mistral-Small-3.1-24B-Base supports much longer context (128K), and the card doesn't explicitly state 4096 as a hard limit; this is a flag worth verifying. Editorial voice is honest and operator-grade, weaknesses are concrete (VRAM, context, thin community), and the verdict properly hedges. Use case is sharp (Indic-language reasoning) and brand fit is strong for local-AI builders working with Indian languages.
Flags: - contextLength=4096 is not clearly substantiated by the card excerpt; base model supports 128K — verify before publishing - useCases list includes 'reasoning' which is fair given hybrid thinking mode, but Telugu (te) is in the language list yet description says '11 Indian languages including Hindi' — minor consistency check
Overview
Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.
Strengths
- 20% average improvement over base model on Indian language benchmarks
- 21.6% gain on math benchmarks, 17.6% on programming benchmarks
- Switchable thinking / non-thinking mode — useful for both reasoning and fast conversational responses
- Apache-2.0 license, commercial use allowed
Weaknesses
- 4096-token context window is short — long documents or multi-turn conversations will hit limits fast
- 24B parameters need serious VRAM; not a laptop-friendly model
- Benchmark gains are vs. its own base model, not vs. broader Hindi-capable competitors
- Low download count (under 5k) means community troubleshooting resources are thin
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 13.2 GB | 17 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Sarvam M.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Sarvam M?
Can I use Sarvam M commercially?
What's the context length of Sarvam M?
Source: huggingface.co/sarvamai/sarvam-m
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Sarvam M runs on your specific hardware before committing money.