Mistral Small 3 24B

Positioning

The model that proved Mistral could still ship competitive open weights post-2024. Mistral Small 3 24B is the cleanest-licensed option in the 20–32B class, with strong instruction polish and runtime stability. Right pick if Apache 2.0 is required and you have a 16 GB+ card.

Strengths

True Apache 2.0 — no MAU caps, no usage restrictions.
Instruction following is excellent — among the most reliable in this size class.
Tool-use format clean and well-documented — Mistral's function-call convention is mature.

Limitations

Slightly weaker than Qwen 3 32B on hard reasoning tasks.
No thinking-mode equivalent — it's a single-mode dense model.
Multilingual is European-focused — Asian languages weaker than Qwen.

Real-world performance on RTX 4090

Q4_K_M (14.6 GB): 75–92 tok/s decode, TTFT ~110 ms — full GPU
Q5_K_M (17.3 GB): 62–78 tok/s
Q8_0 (26 GB): partial offload, 22–30 tok/s

Should you run this locally?

Yes, for Apache-licensed deployments, RTX 4070 Ti 16 GB / 4080 / 5080 owners, or anyone who values instruction-polish over raw capability. No, for users who can run 32B+ and don't care about license terms — Qwen 3 32B is slightly stronger.

How it compares

vs Qwen 3 32B → Qwen 3 32B is slightly smarter; Mistral Small 3 24B has cleaner license + better instruction polish. Pick by priority.
vs Mistral 7B v0.3 → Mistral Small 3 24B is the modern Mistral; 7B v0.3 is obsolete.
vs Mistral Nemo 12B → Small 3 24B wins on capability; Nemo wins on VRAM (8 GB Q4 vs 14.6 GB).
vs Mixtral 8x7B → Small 3 24B uses far less VRAM (14.6 GB vs 26 GB) and is comparable in quality.

Run this yourself

ollama pull mistral-small:24b-instruct-q4_K_M
ollama run mistral-small:24b-instruct-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4080 / 4090

Quantization	File size	VRAM required
Q4_K_M	14.0 GB	18 GB
Q8_0	26.0 GB	30 GB

Quantization

File size

VRAM required

Q4_K_M

14.0 GB

18 GB

Q8_0

26.0 GB

30 GB

Frequently asked

What's the minimum VRAM to run Mistral Small 3 24B?

18GB of VRAM is enough to run Mistral Small 3 24B at the Q4_K_M quantization (file size 14.0 GB). Higher-quality quantizations need more.

Can I use Mistral Small 3 24B commercially?

Yes — Mistral Small 3 24B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Mistral Small 3 24B?

Mistral Small 3 24B supports a context window of 32,768 tokens (about 33K).

How do I install Mistral Small 3 24B with Ollama?

Run `ollama pull mistral-small:24b` to download, then `ollama run mistral-small:24b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Mistral Small 3 24B?

Can I use Mistral Small 3 24B commercially?

What's the context length of Mistral Small 3 24B?

How do I install Mistral Small 3 24B with Ollama?