mistral
24B parameters
Commercial OK

Mistral Small 3 24B

Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.

License: Apache 2.0·Released Jan 30, 2025·Context: 32,768 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
8.4/10
Positioning

The model that proved Mistral could still ship competitive open weights post-2024. Mistral Small 3 24B is the cleanest-licensed option in the 20–32B class, with strong instruction polish and runtime stability. Right pick if Apache 2.0 is required and you have a 16 GB+ card.

Strengths
  • True Apache 2.0 — no MAU caps, no usage restrictions.
  • Instruction following is excellent — among the most reliable in this size class.
  • Tool-use format clean and well-documented — Mistral's function-call convention is mature.
Limitations
  • Slightly weaker than Qwen 3 32B on hard reasoning tasks.
  • No thinking-mode equivalent — it's a single-mode dense model.
  • Multilingual is European-focused — Asian languages weaker than Qwen.
Real-world performance on RTX 4090
  • Q4_K_M (14.6 GB): 75–92 tok/s decode, TTFT ~110 ms — full GPU
  • Q5_K_M (17.3 GB): 62–78 tok/s
  • Q8_0 (26 GB): partial offload, 22–30 tok/s
Should you run this locally?

Yes, for Apache-licensed deployments, RTX 4070 Ti 16 GB / 4080 / 5080 owners, or anyone who values instruction-polish over raw capability. No, for users who can run 32B+ and don't care about license terms — Qwen 3 32B is slightly stronger.

How it compares
  • vs Qwen 3 32B → Qwen 3 32B is slightly smarter; Mistral Small 3 24B has cleaner license + better instruction polish. Pick by priority.
  • vs Mistral 7B v0.3 → Mistral Small 3 24B is the modern Mistral; 7B v0.3 is obsolete.
  • vs Mistral Nemo 12B → Small 3 24B wins on capability; Nemo wins on VRAM (8 GB Q4 vs 14.6 GB).
  • vs Mixtral 8x7B → Small 3 24B uses far less VRAM (14.6 GB vs 26 GB) and is comparable in quality.
Run this yourself
ollama pull mistral-small:24b-instruct-q4_K_M
ollama run mistral-small:24b-instruct-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4080 / 4090
Why this rating

8.4/10 — Mistral's return to relevance in the dense mid-tier. Apache 2.0, strong instruction following, fits on a 24 GB card with comfort. Loses points to Qwen 3 32B which is a slightly bigger, slightly stronger sibling at similar VRAM.

Overview

Re-release of Mistral Small under Apache 2.0. Competitive with Llama 3.3 70B at one-third the size for many tasks.

Strengths

  • Apache 2.0
  • Strong instruction following
  • 32K context

Weaknesses

  • Smaller context than Qwen/Llama

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M14.0 GB18 GB
Q8_026.0 GB30 GB

Get the model

Ollama

One-line install

ollama run mistral-small:24bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Mistral Small 3 24B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Mistral Small 3 24B?

18GB of VRAM is enough to run Mistral Small 3 24B at the Q4_K_M quantization (file size 14.0 GB). Higher-quality quantizations need more.

Can I use Mistral Small 3 24B commercially?

Yes — Mistral Small 3 24B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Mistral Small 3 24B?

Mistral Small 3 24B supports a context window of 32,768 tokens (about 33K).

How do I install Mistral Small 3 24B with Ollama?

Run `ollama pull mistral-small:24b` to download, then `ollama run mistral-small:24b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.