Mistral 7B Instruct v0.3
The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.
The model that defined local LLMs in 2023. Today, it's a benchmark baseline more than a working choice — every newer 7–8B model is meaningfully better while sitting in the same VRAM bracket. The Apache 2.0 license is its remaining real strength.
Strengths- True Apache 2.0 license: no usage caps, no name-restrictions, no DUA. The most legally clean 7B in active use.
- Mature fine-tune ecosystem: thousands of derivatives, well-tested LoRA recipes, strong tooling support.
- Predictable runtime behavior: every runner has stable, well-debugged Mistral support — no surprises.
- Instruction following lags Llama 3.1 8B: more frequent hallucinations on multi-step prompts, weaker JSON adherence.
- No system-prompt support in the v0.3 chat template — quirks the integration story for assistants and agent loops.
- Knowledge cutoff late-2023: noticeably stale on anything 2024+.
- Q4_K_M (4.4 GB): 100–120 tok/s decode, TTFT under 70 ms
- Q5_K_M (5.1 GB): 90–105 tok/s
- Q8_0 (7.7 GB): 75–88 tok/s
Yes, for Apache-license-required commercial deployment, fine-tune base for novel domain adaptation, or as a regression baseline. No, for any general chat or assistant work — Llama 3.1 8B and Qwen 2.5 7B both beat it.
How it compares- vs Llama 3.1 8B → Llama wins on instruction reliability, system-prompt support, and recency. The only reason to prefer Mistral is licensing.
- vs Qwen 2.5 7B → Qwen wins on knowledge breadth and multilingual; Mistral has the simpler license. Almost always pick Qwen unless license is the gating concern.
- vs Mistral Nemo 12B → Nemo replaces Mistral 7B v0.3 in the modern Mistral lineup — same Apache license, materially stronger model for ~50% more VRAM.
- vs Phi-3.5 Mini → comparable capability, Mistral uses ~2× the VRAM. Phi wins on efficiency.
ollama pull mistral:7b-instruct-v0.3-q4_K_M
ollama run mistral:7b-instruct-v0.3-q4_K_M
Settings: Q4_K_M GGUF, 4096 ctx, llama.cpp/CUDA, RTX 4090
›Why this rating
5.5/10 — historically important, currently obsolete. Llama 3.1 8B and Qwen 2.5 7B both surpass it across the board. Keep on disk only if you have a fine-tuned variant you depend on.
Overview
The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.
Strengths
- Apache 2.0
- Native function calling
- Battle-tested
Weaknesses
- Outpaced by Qwen 3 8B
- 32K context only
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.4 GB | 6 GB |
| Q5_K_M | 5.1 GB | 7 GB |
Get the model
Ollama
One-line install
ollama run mistral:7bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Benchmarks
Real measurements on real hardware. Numbers ship with the runner version, quant, and date.
| Hardware | Conf. | Quant | Ctx | Tokens / sec | VRAM | TTFT | Date |
|---|---|---|---|---|---|---|---|
| NVIDIA GeForce RTX 4090(Ollama) | M | Q4_K_M | 4K | 112.3tok/s | 5.1 GB | 64 ms | Apr 22, 26 |
Hardware that runs this
Cards with enough VRAM for at least one quantization of Mistral 7B Instruct v0.3.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Mistral 7B Instruct v0.3?
Can I use Mistral 7B Instruct v0.3 commercially?
What's the context length of Mistral 7B Instruct v0.3?
How do I install Mistral 7B Instruct v0.3 with Ollama?
Source: huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.