mistral

24B parameters

Commercial OK

Reviewed May 2026

Sarvam M

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.

License: apache-2.0·Context: 4,096 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026

9.0/10

If you need solid Hindi (or broader Indic language) coverage and have the VRAM to run a 24B model, Sarvam M is a credible option — the benchmark improvements over its base are real and the Apache-2.0 license keeps commercial use straightforward. The 4096-token context is a genuine constraint, so rule it out if your workload involves long documents. For pure Hindi chat without heavy reasoning, a smaller fine-tuned model may stretch your hardware further. Hedge: worth testing if Indic language quality is your priority, but verify the context limit won't bottleneck your use case before committing.

›Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License is explicitly apache-2.0 on the card and correctly flagged commercial-OK. Params (24B), family (mistral), and vendor (Sarvam AI) are accurate and verifiable. The 4096 context claim is the weakest metadata point — Mistral-Small-3.1-24B-Base supports much longer context (128K), and the card doesn't explicitly state 4096 as a hard limit; this is a flag worth verifying. Editorial voice is honest and operator-grade, weaknesses are concrete (VRAM, context, thin community), and the verdict properly hedges. Use case is sharp (Indic-language reasoning) and brand fit is strong for local-AI builders working with Indian languages.

Flags: - contextLength=4096 is not clearly substantiated by the card excerpt; base model supports 128K — verify before publishing - useCases list includes 'reasoning' which is fair given hybrid thinking mode, but Telugu (te) is in the language list yet description says '11 Indian languages including Hindi' — minor consistency check

Overview

Sarvam M is a 24B text-only model fine-tuned from Mistral-Small-3.1-24B-Base for 11 Indian languages including Hindi. It supports a switchable thinking mode for reasoning tasks alongside a standard chat mode. Benchmark numbers show meaningful gains over the base model in Indian languages, math, and code.

Strengths

20% average improvement over base model on Indian language benchmarks
21.6% gain on math benchmarks, 17.6% on programming benchmarks
Switchable thinking / non-thinking mode — useful for both reasoning and fast conversational responses
Apache-2.0 license, commercial use allowed

Weaknesses

4096-token context window is short — long documents or multi-turn conversations will hit limits fast
24B parameters need serious VRAM; not a laptop-friendly model
Benchmark gains are vs. its own base model, not vs. broader Hindi-capable competitors
Low download count (under 5k) means community troubleshooting resources are thin

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	13.2 GB	17 GB

Get the model

HuggingFace

Original weights

huggingface.co/sarvamai/sarvam-m

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Sarvam M.

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run Sarvam M?

17GB of VRAM is enough to run Sarvam M at the Q4_K_M quantization (file size 13.2 GB). Higher-quality quantizations need more.

Can I use Sarvam M commercially?

Yes — Sarvam M ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam M?

Sarvam M supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/sarvamai/sarvam-m

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify Sarvam M runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →