Ministral 8B Instruct

Positioning

Ministral 8B Instruct is a dense 8-billion-parameter language model from Mistral AI, released under the Mistral Research License. It features a 131,072-token context window and sliding-window attention, making it one of the most accessible long-context models for consumer hardware. Unlike Mistral's larger MoE models, this is a dense architecture, meaning all 8B parameters are active at inference. The research license restricts commercial use; for commercial deployment, Mistral recommends its 7B v0.3 model (Apache 2.0).

Strengths

Long context on consumer hardware: With 128K native context and a dense 8B architecture, this model fits within the memory constraints of a single consumer GPU even at moderate quantizations, while supporting very long documents.
Efficient attention mechanism: Sliding-window attention reduces memory and compute requirements compared to full attention, enabling longer sequences without proportional resource scaling.
Permissive research license: The Mistral Research License allows free use for research and non-commercial projects, making it accessible for academic experimentation with long-context tasks.
Small disk footprint at lower quants: At Q4_K_M, the model occupies only ~4.5 GB on disk, and at Q2_K just ~2.6 GB, leaving ample room for KV cache and overhead on a 12–24 GB GPU.

Limitations

Research-only license: Commercial use is not permitted. Organizations needing a permissive license for production should consider Mistral 7B v0.3 (Apache 2.0) or other open-weight alternatives.
No community benchmarks available: We do not have verified third-party benchmark results for this model. Published vendor metrics should be treated as best-case until independent measurements emerge.
Dense architecture limits throughput: Unlike MoE models that activate only a subset of parameters, all 8B parameters are used per token, which may result in lower throughput compared to similarly sized MoE models on the same hardware.
KV cache overhead at full context: At 128K tokens, the KV cache can add significant memory pressure. Expect to add 30–50% overhead beyond model weights when using long sequences.

What it takes to run this locally

Ministral 8B is firmly in the consumer deployment class. At FP16, the model requires 16 GB of disk space, which fits on a single 24 GB GPU (e.g., RTX 4090) with room for KV cache. For users with 12 GB GPUs, Q4_K_M (4.5 GB) or Q3_K_M (3.9 GB) are practical, leaving 7–8 GB for cache and overhead. At Q2_K (2.6 GB), even 8 GB GPUs can run the model with modest context lengths. Adding 30–50% for KV cache and framework overhead is recommended for typical use.

Should you run this locally?

Yes if you need a long-context model for research or non-commercial projects and have a consumer GPU with at least 8 GB of VRAM. The small quantized sizes make it easy to experiment with 128K context on modest hardware.

No if you require a commercial license, or if you need maximum throughput for production serving. In those cases, consider Mistral 7B v0.3 (Apache 2.0) or a larger MoE model for higher throughput per parameter.

Catalog cross-links

Mistral 7B v0.3
Mistral AI vendor page
Consumer GPU guide

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (ministral)

Ministral 3B Instruct3B

Edge

Ministral 8B Instruct8B

You are here

Quantization	File size	VRAM required
Q4_K_M	5.0 GB	8 GB

Quantization

File size

VRAM required

Q4_K_M

5.0 GB

8 GB

Frequently asked

What's the minimum VRAM to run Ministral 8B Instruct?

8GB of VRAM is enough to run Ministral 8B Instruct at the Q4_K_M quantization (file size 5.0 GB). Higher-quality quantizations need more.

Can I use Ministral 8B Instruct commercially?

Ministral 8B Instruct is released under the Mistral Research License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Ministral 8B Instruct?

Ministral 8B Instruct supports a context window of 131,072 tokens (about 131K).

Our verdict

Positioning

Strengths

Limitations

What it takes to run this locally

Should you run this locally?

Catalog cross-links

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Ministral 8B Instruct?

Can I use Ministral 8B Instruct commercially?

What's the context length of Ministral 8B Instruct?

Related — keep moving