Mistral Saba 24B
Mistral's Arabic and South Asian language specialist at 24B. Research license.
Positioning
Mistral Saba 24B is a dense 24-billion-parameter model from Mistral AI, released under the Mistral Research License. It specializes in Arabic and South Asian languages, making it a niche entry in the open-weight landscape. With a 32,768-token context window, it is designed for multilingual tasks where these languages are central. Its dense architecture means inference cost scales linearly with parameter count, unlike Mixture-of-Experts models.
Strengths
- Targeted multilingual capability: Mistral Saba 24B is explicitly optimized for Arabic and South Asian languages, a rare specialization among open-weight models.
- Dense architecture simplicity: As a dense 24B model, it avoids the complexity of MoE routing and memory overhead, making it straightforward to deploy.
- Permissive research license: The Mistral Research License allows free use for research and non-commercial applications, with clear terms for commercial use (requires separate agreement).
- Consumer-friendly size: At Q4_K_M (13.5 GB) or Q3_K_M (11.7 GB), it fits on a single consumer GPU with 16-24 GB VRAM, after accounting for KV cache and overhead.
Limitations
- Research-only license: Commercial deployment requires a separate license from Mistral AI, which may not be available or affordable for all operators.
- No community benchmarks yet: We do not have independent measurements for this model. Published vendor metrics should be treated as best-case until verified.
- Narrow language focus: Its specialization means it may underperform on general English tasks compared to similarly sized general-purpose models.
- Dense 24B compute cost: Unlike MoE models with lower active parameters, this dense model requires full 24B parameter compute per token, limiting throughput on lower-end hardware.
What it takes to run this locally
At FP16, the model requires 48 GB of disk space, plus ~30-50% additional memory for KV cache and framework overhead at full context. Quantized versions reduce the footprint: Q8_0 (26 GB), Q6_K (19.8 GB), Q5_K_M (17.1 GB), Q4_K_M (13.5 GB), Q3_K_M (11.7 GB), and Q2_K (~7.8 GB). For consumer deployment, Q4_K_M or Q3_K_M on a single 16-24 GB GPU is feasible, while Q2_K may fit on 12 GB cards with reduced quality. Workstation-class hardware (48 GB) can run Q8_0 or FP16 with ample context overhead.
Should you run this locally?
Yes if your work focuses on Arabic or South Asian languages and you need a model that can run on consumer hardware with a research-friendly license. No if you require commercial deployment without a separate agreement, or if your primary use case is general English tasks where broader models may be more suitable.
Catalog cross-links
- Mistral 7B
- Mistral 8x7B
- Mistral Large
Overview
Mistral's Arabic and South Asian language specialist at 24B. Research license.
How to run it
Mistral Saba 24B is Mistral AI's Arabic-specialized 24B dense model. Saba is Mistral's regional language model — optimized for Arabic language understanding, Middle Eastern cultural context, and Arabic+English bilingual tasks. Run at Q4_K_M via Ollama (ollama pull mistral-saba:24b) or llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~14 GB on disk. Minimum VRAM: 12 GB — RTX 4070 (12GB) at Q4_K_M with KV offload for 4K context. RTX 4090 24GB: Q4_K_M comfortably at 16K+ context. Recommended: RTX 4090 24GB at Q4_K_M. Throughput: ~40-65 tok/s on RTX 4090 at Q4_K_M. Mistral architecture — well-supported. Saba is designed for: Arabic chat, Arabic content generation, Arabic document understanding, Arabic+English code-switching, Middle East context applications. Not for: non-Arabic languages (quality degrades significantly), general-purpose use (use Mistral Small 3.2 24B instead). Context: Mistral's 32K+; practical at Q4 on 24 GB is 16-32K. Mistral Saba is one of the few openly available Arabic-specialized LLMs at this size.
Hardware guidance
Minimum: RTX 3060 12GB at Q3_K_M with KV offload. Recommended: RTX 4090 24GB at Q4_K_M (16K+ context). VRAM math: 24B dense, Q4_K_M ≈ 14 GB. KV cache at 16K: ~5 GB. Total: ~19 GB at 16K. RTX 4090 24GB: comfortable on-GPU. RTX 3080 10GB: Q3_K_M with KV offload. RTX 4080 16GB: Q4 + 8K context on-GPU. MacBook Pro M4 Pro 24GB+: Q4 at 15-30 tok/s. Cloud: A10 24GB at Q4_K_M. AWQ-INT4 drops to ~12 GB. Arabic text has different tokenization efficiency than English — Arabic may be 1.2-1.5× more token-costly for equivalent semantic content. Budget slightly more tokens for Arabic prompts.
What breaks first
- Arabic-only specialization. Saba is heavily optimized for Arabic. English is functional but lower quality. Non-Arabic languages (French, Spanish, etc.) degrade significantly. 2. Dialectal Arabic variance. Saba is trained on Modern Standard Arabic (MSA). Dialectal Arabic (Egyptian, Levantine, Gulf) may produce lower-quality results. Test your specific dialect. 3. Cultural context scope. Saba's cultural knowledge is Middle East-focused. North African cultural contexts may have gaps. 4. Smaller community quant coverage. As a regional-specialized model, Saba has fewer pre-converted GGUFs than general-purpose Mistral models. Verify quantization availability before provisioning.
Runtime recommendation
Common beginner mistakes
Mistake: Using Mistral Saba for non-Arabic tasks. Fix: Saba is Arabic-specialized. English is functional but lower quality. Use Mistral Small 3.2 24B for general-purpose tasks. Mistake: Expecting Saba to handle all Arabic dialects equally. Fix: Saba is trained on MSA. Test on your specific dialect (Egyptian, Levantine, Gulf, Maghrebi) — quality varies. Mistake: Assuming English tokenization is the same as Arabic. Fix: Arabic may produce 1.2-1.5× more tokens for equivalent semantic content. Adjust context budget accordingly. Mistake: Using Llama chat template with Saba. Fix: Mistral models use Mistral-specific templates. Verify on hf tokenizer_config.json.
Strengths
- Arabic + South Asian language depth
Weaknesses
- Research license
- Specialized — not general
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 14.0 GB | 18 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Mistral Saba 24B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Mistral Saba 24B?
Can I use Mistral Saba 24B commercially?
What's the context length of Mistral Saba 24B?
Source: huggingface.co/mistralai/Mistral-Saba-24B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Mistral Saba 24B runs on your specific hardware before committing money.