other
105B parameters
Commercial OK
Reviewed May 2026

Sarvam 105B FP8

Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages and supports a 128K context window via YaRN scaling. This repo ships FP8-quantized weights intended for deployment with SGLang or patched vLLM.

License: apache-2.0·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.2/10

If you're running inference infrastructure and need strong Hindi (or broader Indic-language) coverage at long context, this is currently one of the few serious options at this scale. The FP8 weights help, but you still need a multi-GPU server and a non-standard inference stack — this is not a plug-and-play download. The extremely low download and like counts mean you're likely to hit rough edges with minimal community help. Hedge: worth evaluating if Indic language quality is your primary requirement and you have the hardware; skip it if you're hoping for easy deployment or active community support.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.15/10. License is explicitly apache-2.0 in the card and matches the claim. Metadata is accurate: 105B total with 10.3B active MoE, 128K context via YaRN, FP8 weights, Indian-language focus all verifiable from the card. Editorial voice is honest and operator-grade — explicitly flags non-standard inference stack, VRAM requirements, and weak community signal. The row slightly misrepresents the description by saying '22 Indian languages' (the card says SOTA across 22 Indian languages for its size, which matches), and parameterCountB=105 is correct as total params though active is 10.3B — acceptable as it's the standard convention. Brand fit is moderate since this is datacenter-scale, not a local-laptop model, but Indic coverage at this scale is genuinely useful to the audience. Clears the 9.0 bar.

Flags: - Datacenter-scale model on a 'local AI' catalog — ensure framing makes hardware requirements obvious (the row does this adequately) - parameterCountB=105 reflects total params, not active 10.3B — convention is fine but readers may need the MoE distinction surfaced in UI

Overview

Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages and supports a 128K context window via YaRN scaling. This repo ships FP8-quantized weights intended for deployment with SGLang or patched vLLM.

Strengths

  • Covers 22 Indian languages including Hindi — broadest regional coverage in its class
  • 128K context via YaRN scaling
  • Only 10.3B parameters are active at inference time despite 105B total, reducing compute per token
  • Apache-2.0 license — commercial use permitted

Weaknesses

  • Even with FP8 quantization, loading 105B weights demands serious VRAM — not a consumer-GPU model
  • Requires SGLang or a patched vLLM build; stock inference stacks won't work out of the box
  • MoE routing can introduce latency spikes compared to equivalently-sized dense models
  • 546 HF downloads and 5 likes — very limited community testing or troubleshooting resources

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M57.8 GB74 GB

Get the model

HuggingFace

Original weights

huggingface.co/sarvamai/sarvam-105b-fp8

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Sarvam 105B FP8.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run Sarvam 105B FP8?

74GB of VRAM is enough to run Sarvam 105B FP8 at the Q4_K_M quantization (file size 57.8 GB). Higher-quality quantizations need more.

Can I use Sarvam 105B FP8 commercially?

Yes — Sarvam 105B FP8 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam 105B FP8?

Sarvam 105B FP8 supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/sarvamai/sarvam-105b-fp8

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Sarvam 105B FP8 runs on your specific hardware before committing money.