Text & Reasoning
Mixed (open + closed variants)
Apache 2.0 (open) + Mistral Commercial (closed)

Mistral

by Mistral AI

Mistral's mixed open + closed family. Mistral 7B + Mixtral 8x22B are the open-weight standards; Mistral Large + Codestral are commercial. Codestral Mamba 7B introduced state-space models to production code workflows.

Best entry point for local use

Start with Mistral Small 22B at Q4_K_M via Ollama — fits on single RTX 4090 24 GB, delivers strong European-language token efficiency (32K sentencepiece BPE vocab) and competitive reasoning (MMLU 80%). The 22B sits at Mistral's optimal density-performance intersection — better than Llama 3.1 8B for multilingual tasks, more deployable than Mixtral's MoE overhead. If your VRAM budget is <12 GB, use Mistral 7B v0.3 Q4 (5 GB) — runs on MacBook Pro M4 Max at 30+ tok/s and remains competitive for general assistant workloads. Skip Mixtral 8x22B for first deployment — the MoE complexity adds serving overhead without proportional quality gains over the 22B dense. Skip Codestral Mamba unless you specifically need O(1) per-token long-context inference — the Mamba architecture has narrower runtime support.

Deployment guidance

For single-user local: Ollama + mistral:22b Q4_K_M on RTX 4090 24 GB. For multi-user serving: vLLM with AWQ 4-bit on 2× L40S — Mistral's sliding window attention (SWA) enables efficient KV-cache management at high concurrency. For Mixtral 8x7B MoE: vLLM 0.5.4+ on 2× RTX 4090 with expert parallelism — 12.9B active per token means VRAM requirement is ~30 GB Q4 for full model. For mobile/edge: llama.cpp Mistral 7B Q4_0 on Snapdragon X Elite — ~20 tok/s. For Codestral Mamba: mamba.c CUDA kernels required — standard transformer engines do not support Mamba architecture. Mistral models use European-optimized tokenizer — English throughput is ~12% lower token-for-token vs Llama. See GPU buyer guide (same GPU class applies).

Featured models

Models in this family with our verdicts

Recommended runtimes

Related families

Related — keep moving

Runtimes that fit
Alternatives
Before you buy

Verify Mistral runs on your specific hardware before committing money.