mistral

24B parameters

Restricted

Reviewed June 2026

Mistral Saba 24B

Mistral's Arabic and South Asian language specialist at 24B. Research license.

License: Mistral Research License·Released Feb 17, 2025·Context: 32,768 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026

unrated

Positioning

Mistral Saba 24B is a dense 24-billion-parameter model from Mistral AI, released under the Mistral Research License. It specializes in Arabic and South Asian languages, making it a niche entry in the open-weight landscape. With a 32,768-token context window, it is designed for multilingual tasks where these languages are central. Its dense architecture means inference cost scales linearly with parameter count, unlike Mixture-of-Experts models.

Strengths

Targeted multilingual capability: Mistral Saba 24B is explicitly optimized for Arabic and South Asian languages, a rare specialization among open-weight models.
Dense architecture simplicity: As a dense 24B model, it avoids the complexity of MoE routing and memory overhead, making it straightforward to deploy.
Permissive research license: The Mistral Research License allows free use for research and non-commercial applications, with clear terms for commercial use (requires separate agreement).
Consumer-friendly size: At Q4_K_M (13.5 GB) or Q3_K_M (11.7 GB), it fits on a single consumer GPU with 16-24 GB VRAM, after accounting for KV cache and overhead.

Limitations

Research-only license: Commercial deployment requires a separate license from Mistral AI, which may not be available or affordable for all operators.
No community benchmarks yet: We do not have independent measurements for this model. Published vendor metrics should be treated as best-case until verified.
Narrow language focus: Its specialization means it may underperform on general English tasks compared to similarly sized general-purpose models.
Dense 24B compute cost: Unlike MoE models with lower active parameters, this dense model requires full 24B parameter compute per token, limiting throughput on lower-end hardware.

What it takes to run this locally

At FP16, the model requires 48 GB of disk space, plus ~30-50% additional memory for KV cache and framework overhead at full context. Quantized versions reduce the footprint: Q8_0 (26 GB), Q6_K (19.8 GB), Q5_K_M (17.1 GB), Q4_K_M (13.5 GB), Q3_K_M (11.7 GB), and Q2_K (~7.8 GB). For consumer deployment, Q4_K_M or Q3_K_M on a single 16-24 GB GPU is feasible, while Q2_K may fit on 12 GB cards with reduced quality. Workstation-class hardware (48 GB) can run Q8_0 or FP16 with ample context overhead.

Should you run this locally?

Yes if your work focuses on Arabic or South Asian languages and you need a model that can run on consumer hardware with a research-friendly license. No if you require commercial deployment without a separate agreement, or if your primary use case is general English tasks where broader models may be more suitable.

Catalog cross-links

Mistral 7B
Mistral 8x7B
Mistral Large

Overview

Mistral's Arabic and South Asian language specialist at 24B. Research license.

How to run it

Mistral Saba 24B is Mistral AI's Arabic-specialized 24B dense model. Saba is Mistral's regional language model — optimized for Arabic language understanding, Middle Eastern cultural context, and Arabic+English bilingual tasks. Run at Q4_K_M via Ollama (ollama pull mistral-saba:24b) or llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~14 GB on disk. Minimum VRAM: 12 GB — RTX 4070 (12GB) at Q4_K_M with KV offload for 4K context. RTX 4090 24GB: Q4_K_M comfortably at 16K+ context. Recommended: RTX 4090 24GB at Q4_K_M. Throughput: ~40-65 tok/s on RTX 4090 at Q4_K_M. Mistral architecture — well-supported. Saba is designed for: Arabic chat, Arabic content generation, Arabic document understanding, Arabic+English code-switching, Middle East context applications. Not for: non-Arabic languages (quality degrades significantly), general-purpose use (use Mistral Small 3.2 24B instead). Context: Mistral's 32K+; practical at Q4 on 24 GB is 16-32K. Mistral Saba is one of the few openly available Arabic-specialized LLMs at this size.

Hardware guidance

Minimum: RTX 3060 12GB at Q3_K_M with KV offload. Recommended: RTX 4090 24GB at Q4_K_M (16K+ context). VRAM math: 24B dense, Q4_K_M ≈ 14 GB. KV cache at 16K: ~5 GB. Total: ~19 GB at 16K. RTX 4090 24GB: comfortable on-GPU. RTX 3080 10GB: Q3_K_M with KV offload. RTX 4080 16GB: Q4 + 8K context on-GPU. MacBook Pro M4 Pro 24GB+: Q4 at 15-30 tok/s. Cloud: A10 24GB at Q4_K_M. AWQ-INT4 drops to ~12 GB. Arabic text has different tokenization efficiency than English — Arabic may be 1.2-1.5× more token-costly for equivalent semantic content. Budget slightly more tokens for Arabic prompts.

What breaks first

Arabic-only specialization. Saba is heavily optimized for Arabic. English is functional but lower quality. Non-Arabic languages (French, Spanish, etc.) degrade significantly. 2. Dialectal Arabic variance. Saba is trained on Modern Standard Arabic (MSA). Dialectal Arabic (Egyptian, Levantine, Gulf) may produce lower-quality results. Test your specific dialect. 3. Cultural context scope. Saba's cultural knowledge is Middle East-focused. North African cultural contexts may have gaps. 4. Smaller community quant coverage. As a regional-specialized model, Saba has fewer pre-converted GGUFs than general-purpose Mistral models. Verify quantization availability before provisioning.

Runtime recommendation

Ollama for quick-start (Saba should be available as a Mistral model). llama.cpp for production. vLLM for serving. Mistral architecture — first-class support everywhere. For Arabic RAG: pair with Arabic-specific embeddings (e.g., camel-bert, Arabic-T5) for document retrieval.

Common beginner mistakes

Mistake: Using Mistral Saba for non-Arabic tasks. Fix: Saba is Arabic-specialized. English is functional but lower quality. Use Mistral Small 3.2 24B for general-purpose tasks. Mistake: Expecting Saba to handle all Arabic dialects equally. Fix: Saba is trained on MSA. Test on your specific dialect (Egyptian, Levantine, Gulf, Maghrebi) — quality varies. Mistake: Assuming English tokenization is the same as Arabic. Fix: Arabic may produce 1.2-1.5× more tokens for equivalent semantic content. Adjust context budget accordingly. Mistake: Using Llama chat template with Saba. Fix: Mistral models use Mistral-specific templates. Verify on hf tokenizer_config.json.

Strengths

Arabic + South Asian language depth

Weaknesses

Research license
Specialized — not general

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	14.0 GB	18 GB

Get the model

HuggingFace

Original weights

huggingface.co/mistralai/Mistral-Saba-24B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Mistral Saba 24B.

NVIDIA B300 (Blackwell Ultra)

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run Mistral Saba 24B?

18GB of VRAM is enough to run Mistral Saba 24B at the Q4_K_M quantization (file size 14.0 GB). Higher-quality quantizations need more.

Can I use Mistral Saba 24B commercially?

Mistral Saba 24B is released under the Mistral Research License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Mistral Saba 24B?

Mistral Saba 24B supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/mistralai/Mistral-Saba-24B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify Mistral Saba 24B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →