Ministral 8B Instruct
Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.
Positioning
Ministral 8B Instruct is a dense 8-billion-parameter language model from Mistral AI, released under the Mistral Research License. It features a 131,072-token context window and sliding-window attention, making it one of the most accessible long-context models for consumer hardware. Unlike Mistral's larger MoE models, this is a dense architecture, meaning all 8B parameters are active at inference. The research license restricts commercial use; for commercial deployment, Mistral recommends its 7B v0.3 model (Apache 2.0).
Strengths
- Long context on consumer hardware: With 128K native context and a dense 8B architecture, this model fits within the memory constraints of a single consumer GPU even at moderate quantizations, while supporting very long documents.
- Efficient attention mechanism: Sliding-window attention reduces memory and compute requirements compared to full attention, enabling longer sequences without proportional resource scaling.
- Permissive research license: The Mistral Research License allows free use for research and non-commercial projects, making it accessible for academic experimentation with long-context tasks.
- Small disk footprint at lower quants: At Q4_K_M, the model occupies only ~4.5 GB on disk, and at Q2_K just ~2.6 GB, leaving ample room for KV cache and overhead on a 12–24 GB GPU.
Limitations
- Research-only license: Commercial use is not permitted. Organizations needing a permissive license for production should consider Mistral 7B v0.3 (Apache 2.0) or other open-weight alternatives.
- No community benchmarks available: We do not have verified third-party benchmark results for this model. Published vendor metrics should be treated as best-case until independent measurements emerge.
- Dense architecture limits throughput: Unlike MoE models that activate only a subset of parameters, all 8B parameters are used per token, which may result in lower throughput compared to similarly sized MoE models on the same hardware.
- KV cache overhead at full context: At 128K tokens, the KV cache can add significant memory pressure. Expect to add 30–50% overhead beyond model weights when using long sequences.
What it takes to run this locally
Ministral 8B is firmly in the consumer deployment class. At FP16, the model requires 16 GB of disk space, which fits on a single 24 GB GPU (e.g., RTX 4090) with room for KV cache. For users with 12 GB GPUs, Q4_K_M (4.5 GB) or Q3_K_M (3.9 GB) are practical, leaving 7–8 GB for cache and overhead. At Q2_K (2.6 GB), even 8 GB GPUs can run the model with modest context lengths. Adding 30–50% for KV cache and framework overhead is recommended for typical use.
Should you run this locally?
Yes if you need a long-context model for research or non-commercial projects and have a consumer GPU with at least 8 GB of VRAM. The small quantized sizes make it easy to experiment with 128K context on modest hardware.
No if you require a commercial license, or if you need maximum throughput for production serving. In those cases, consider Mistral 7B v0.3 (Apache 2.0) or a larger MoE model for higher throughput per parameter.
Catalog cross-links
- Mistral 7B v0.3
- Mistral AI vendor page
- Consumer GPU guide
Overview
Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- 128k context
- Sliding-window attention
Weaknesses
- Research license
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 5.0 GB | 8 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Ministral 8B Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Ministral 8B Instruct?
Can I use Ministral 8B Instruct commercially?
What's the context length of Ministral 8B Instruct?
Source: huggingface.co/mistralai/Ministral-8B-Instruct-2410
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Ministral 8B Instruct runs on your specific hardware before committing money.