Codestral 22B
Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.
Codestral 22B is Mistral's dedicated coding model. The headline feature was strong fill-in-the-middle on a wide language matrix (80+ languages) — still relevant, especially for non-mainstream language work where Qwen 2.5 Coder's Python/JS bias shows.
Strengths- Mistral Non-Production License is more permissive than Qwen license for non-commercial / research / internal use.
- Strong on long-tail languages — Lua, Erlang, Haskell, Clojure handled better than Qwen 2.5 Coder.
- Low VRAM for a coder — 13 GB at Q4_K_M fits comfortably on 16 GB cards.
- License is not Apache — commercial deployment requires a separate Mistral commercial license.
- Qwen 2.5 Coder 32B is materially stronger on the mainstream language pairs (Python, JS, TS, Go).
- No FIM-via-chat ergonomics — works best with editor plugins that issue raw FIM requests.
- Q4_K_M (13 GB): 70–88 tok/s decode, TTFT ~110 ms — full GPU
- Q5_K_M (15.4 GB): 60–74 tok/s
- Q8_0 (23.3 GB): 40–50 tok/s
Yes, for non-commercial coding work in long-tail languages, or 16 GB GPU owners who want a dedicated coder without partial-offload. No, for mainstream Python/JS coding where Qwen 2.5 Coder 32B is materially stronger, or for commercial deployment without the separate Mistral commercial license.
How it compares- vs Qwen 2.5 Coder 32B → Qwen wins on capability for Python/JS/TS; Codestral wins on long-tail language coverage and license simplicity for non-commercial use.
- vs DeepSeek Coder V2 Lite (16B) → Codestral 22B is stronger in absolute capability; DeepSeek Coder V2 Lite uses less VRAM.
- vs CodeGemma 7B → Codestral 22B is much more capable; CodeGemma is the right pick under 8 GB VRAM.
ollama pull codestral:22b-v0.1-q4_K_M
ollama run codestral:22b-v0.1-q4_K_M
Settings: Q4_K_M GGUF, 32768 ctx, full GPU on RTX 4080 / 4090
Editor: Continue.dev with FIM endpoint enabled
›Why this rating
7.9/10 — Mistral's coding specialist. Excellent fill-in-the-middle, license is the cleanest in the dedicated-coder space, but Qwen 2.5 Coder 32B has decisively overtaken it on raw capability. Loses points for being one tier behind on quality.
Overview
Mistral's coding-specialist. Strong fill-in-the-middle for IDE autocompletion. Personal/research use only.
Strengths
- Fast FIM completion
- 80+ languages
Weaknesses
- Not for commercial use
- Behind Qwen 2.5 Coder on most benchmarks
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 13.0 GB | 16 GB |
| Q8_0 | 23.0 GB | 26 GB |
Get the model
Ollama
One-line install
ollama run codestral:22bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Codestral 22B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Codestral 22B?
Can I use Codestral 22B commercially?
What's the context length of Codestral 22B?
How do I install Codestral 22B with Ollama?
Source: huggingface.co/mistralai/Codestral-22B-v0.1
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.