Codestral 22B

Positioning

Codestral 22B is Mistral's dedicated coding model. The headline feature was strong fill-in-the-middle on a wide language matrix (80+ languages) — still relevant, especially for non-mainstream language work where Qwen 2.5 Coder's Python/JS bias shows.

Strengths

Mistral Non-Production License is more permissive than Qwen license for non-commercial / research / internal use.
Strong on long-tail languages — Lua, Erlang, Haskell, Clojure handled better than Qwen 2.5 Coder.
Low VRAM for a coder — 13 GB at Q4_K_M fits comfortably on 16 GB cards.

Limitations

License is not Apache — commercial deployment requires a separate Mistral commercial license.
Qwen 2.5 Coder 32B is materially stronger on the mainstream language pairs (Python, JS, TS, Go).
No FIM-via-chat ergonomics — works best with editor plugins that issue raw FIM requests.

Real-world performance on RTX 4090

Q4_K_M (13 GB): 70–88 tok/s decode, TTFT ~110 ms — full GPU
Q5_K_M (15.4 GB): 60–74 tok/s
Q8_0 (23.3 GB): 40–50 tok/s

Should you run this locally?

Yes, for non-commercial coding work in long-tail languages, or 16 GB GPU owners who want a dedicated coder without partial-offload. No, for mainstream Python/JS coding where Qwen 2.5 Coder 32B is materially stronger, or for commercial deployment without the separate Mistral commercial license.

How it compares

vs Qwen 2.5 Coder 32B → Qwen wins on capability for Python/JS/TS; Codestral wins on long-tail language coverage and license simplicity for non-commercial use.
vs DeepSeek Coder V2 Lite (16B) → Codestral 22B is stronger in absolute capability; DeepSeek Coder V2 Lite uses less VRAM.
vs CodeGemma 7B → Codestral 22B is much more capable; CodeGemma is the right pick under 8 GB VRAM.

Run this yourself

ollama pull codestral:22b-v0.1-q4_K_M
ollama run codestral:22b-v0.1-q4_K_M

Settings: Q4_K_M GGUF, 32768 ctx, full GPU on RTX 4080 / 4090 Editor: Continue.dev with FIM endpoint enabled

Quantization	File size	VRAM required
Q4_K_M	13.0 GB	16 GB
Q8_0	23.0 GB	26 GB

Quantization

File size

VRAM required

Q4_K_M

13.0 GB

16 GB

Q8_0

23.0 GB

26 GB

Frequently asked

What's the minimum VRAM to run Codestral 22B?

16GB of VRAM is enough to run Codestral 22B at the Q4_K_M quantization (file size 13.0 GB). Higher-quality quantizations need more.

Can I use Codestral 22B commercially?

Codestral 22B is released under the Mistral Non-Production License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Codestral 22B?

Codestral 22B supports a context window of 32,768 tokens (about 33K).

How do I install Codestral 22B with Ollama?

Run `ollama pull codestral:22b` to download, then `ollama run codestral:22b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Codestral 22B?

Can I use Codestral 22B commercially?

What's the context length of Codestral 22B?

How do I install Codestral 22B with Ollama?