mistral
24B parameters
Commercial OK
Reviewed May 2026

Sarvam M

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.

License: apache-2.0·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.0/10

If you need solid Hindi (or broader Indic language) coverage and have the VRAM to run a 24B model, Sarvam M is a credible option — the benchmark improvements over its base are real and the Apache-2.0 license keeps commercial use straightforward. The 4096-token context is a genuine constraint, so rule it out if your workload involves long documents. For pure Hindi chat without heavy reasoning, a smaller fine-tuned model may stretch your hardware further. Hedge: worth testing if Indic language quality is your priority, but verify the context limit won't bottleneck your use case before committing.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is explicitly apache-2.0 on the card and correctly flagged commercial-OK. Params (24B), family (mistral), and vendor (Sarvam AI) are accurate and verifiable. The 4096 context claim is the weakest metadata point — Mistral-Small-3.1-24B-Base supports much longer context (128K), and the card doesn't explicitly state 4096 as a hard limit; this is a flag worth verifying. Editorial voice is honest and operator-grade, weaknesses are concrete (VRAM, context, thin community), and the verdict properly hedges. Use case is sharp (Indic-language reasoning) and brand fit is strong for local-AI builders working with Indian languages.

Flags: - contextLength=4096 is not clearly substantiated by the card excerpt; base model supports 128K — verify before publishing - useCases list includes 'reasoning' which is fair given hybrid thinking mode, but Telugu (te) is in the language list yet description says '11 Indian languages including Hindi' — minor consistency check

Overview

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.

Strengths

  • 20% average improvement over base model on Indian language benchmarks
  • 21.6% gain on math benchmarks, 17.6% on programming benchmarks
  • Switchable thinking / non-thinking mode — useful for both reasoning and fast conversational responses
  • Apache-2.0 license, commercial use allowed

Weaknesses

  • 4096-token context window is short — long documents or multi-turn conversations will hit limits fast
  • 24B parameters need serious VRAM; not a laptop-friendly model
  • Benchmark gains are vs. its own base model, not vs. broader Hindi-capable competitors
  • Low download count (under 5k) means community troubleshooting resources are thin

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M13.2 GB17 GB

Get the model

HuggingFace

Original weights

huggingface.co/sarvamai/sarvam-m

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Sarvam M.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Sarvam M?

17GB of VRAM is enough to run Sarvam M at the Q4_K_M quantization (file size 13.2 GB). Higher-quality quantizations need more.

Can I use Sarvam M commercially?

Yes — Sarvam M ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam M?

Sarvam M supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/sarvamai/sarvam-m

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Sarvam M runs on your specific hardware before committing money.