Mistral Small 3 24B
Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.
The model that proved Mistral could still ship competitive open weights post-2024. Mistral Small 3 24B is the cleanest-licensed option in the 20–32B class, with strong instruction polish and runtime stability. Right pick if Apache 2.0 is required and you have a 16 GB+ card.
Strengths- True Apache 2.0 — no MAU caps, no usage restrictions.
- Instruction following is excellent — among the most reliable in this size class.
- Tool-use format clean and well-documented — Mistral's function-call convention is mature.
- Slightly weaker than Qwen 3 32B on hard reasoning tasks.
- No thinking-mode equivalent — it's a single-mode dense model.
- Multilingual is European-focused — Asian languages weaker than Qwen.
- Q4_K_M (14.6 GB): 75–92 tok/s decode, TTFT ~110 ms — full GPU
- Q5_K_M (17.3 GB): 62–78 tok/s
- Q8_0 (26 GB): partial offload, 22–30 tok/s
Yes, for Apache-licensed deployments, RTX 4070 Ti 16 GB / 4080 / 5080 owners, or anyone who values instruction-polish over raw capability. No, for users who can run 32B+ and don't care about license terms — Qwen 3 32B is slightly stronger.
How it compares- vs Qwen 3 32B → Qwen 3 32B is slightly smarter; Mistral Small 3 24B has cleaner license + better instruction polish. Pick by priority.
- vs Mistral 7B v0.3 → Mistral Small 3 24B is the modern Mistral; 7B v0.3 is obsolete.
- vs Mistral Nemo 12B → Small 3 24B wins on capability; Nemo wins on VRAM (8 GB Q4 vs 14.6 GB).
- vs Mixtral 8x7B → Small 3 24B uses far less VRAM (14.6 GB vs 26 GB) and is comparable in quality.
ollama pull mistral-small:24b-instruct-q4_K_M
ollama run mistral-small:24b-instruct-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4080 / 4090
›Why this rating
8.4/10 — Mistral's return to relevance in the dense mid-tier. Apache 2.0, strong instruction following, fits on a 24 GB card with comfort. Loses points to Qwen 3 32B which is a slightly bigger, slightly stronger sibling at similar VRAM.
Overview
Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.
Strengths
- Apache 2.0
- Strong instruction following
- 32K context
Weaknesses
- Smaller context than Qwen/Llama
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 14.0 GB | 18 GB |
| Q8_0 | 26.0 GB | 30 GB |
Get the model
Ollama
One-line install
ollama run mistral-small:24bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Mistral Small 3 24B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Mistral Small 3 24B?
Can I use Mistral Small 3 24B commercially?
What's the context length of Mistral Small 3 24B?
How do I install Mistral Small 3 24B with Ollama?
Source: huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.