Sarvam 105B FP8

Sarvam 105B FP8

Sarvam-105B is a Mixture-of-Experts model with 10.3B active parameters built for Indian-language tasks, reasoning, and coding. It covers 22 Indian languages and supports a 128K context window via YaRN scaling. This repo ships FP8-quantized weights intended for deployment with SGLang or patched vLLM.

License: apache-2.0·Context: 131,072 tokens

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.15/10. License is explicitly apache-2.0 in the card and matches the claim. Metadata is accurate: 105B total with 10.3B active MoE, 128K context via YaRN, FP8 weights, Indian-language focus all verifiable from the card. Editorial voice is honest and operator-grade — explicitly flags non-standard inference stack, VRAM requirements, and weak community signal. The row slightly misrepresents the description by saying '22 Indian languages' (the card says SOTA across 22 Indian languages for its size, which matches), and parameterCountB=105 is correct as total params though active is 10.3B — acceptable as it's the standard convention. Brand fit is moderate since this is datacenter-scale, not a local-laptop model, but Indic coverage at this scale is genuinely useful to the audience. Clears the 9.0 bar.

Flags: - Datacenter-scale model on a 'local AI' catalog — ensure framing makes hardware requirements obvious (the row does this adequately) - parameterCountB=105 reflects total params, not active 10.3B — convention is fine but readers may need the MoE distinction surfaced in UI

Overview

Quantization	File size	VRAM required
Q4_K_M	57.8 GB	74 GB

Quantization

File size

VRAM required

Q4_K_M

57.8 GB

74 GB

Frequently asked

What's the minimum VRAM to run Sarvam 105B FP8?

74GB of VRAM is enough to run Sarvam 105B FP8 at the Q4_K_M quantization (file size 57.8 GB). Higher-quality quantizations need more.

Can I use Sarvam 105B FP8 commercially?

Yes — Sarvam 105B FP8 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Sarvam 105B FP8?

Sarvam 105B FP8 supports a context window of 131,072 tokens (about 131K).

Our verdict

Overview

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Sarvam 105B FP8?

Can I use Sarvam 105B FP8 commercially?

What's the context length of Sarvam 105B FP8?

Related — keep moving