mistral
8B parameters
Restricted
Reviewed June 2026

Ministral 8B Instruct

Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.

License: Mistral Research License·Released Oct 16, 2024·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Ministral 8B Instruct is a dense 8-billion-parameter language model from Mistral AI, released under the Mistral Research License. It features a 131,072-token context window and sliding-window attention, making it one of the most accessible long-context models for consumer hardware. Unlike Mistral's larger MoE models, this is a dense architecture, meaning all 8B parameters are active at inference. The research license restricts commercial use; for commercial deployment, Mistral recommends its 7B v0.3 model (Apache 2.0).

Strengths

  • Long context on consumer hardware: With 128K native context and a dense 8B architecture, this model fits within the memory constraints of a single consumer GPU even at moderate quantizations, while supporting very long documents.
  • Efficient attention mechanism: Sliding-window attention reduces memory and compute requirements compared to full attention, enabling longer sequences without proportional resource scaling.
  • Permissive research license: The Mistral Research License allows free use for research and non-commercial projects, making it accessible for academic experimentation with long-context tasks.
  • Small disk footprint at lower quants: At Q4_K_M, the model occupies only ~4.5 GB on disk, and at Q2_K just ~2.6 GB, leaving ample room for KV cache and overhead on a 12–24 GB GPU.

Limitations

  • Research-only license: Commercial use is not permitted. Organizations needing a permissive license for production should consider Mistral 7B v0.3 (Apache 2.0) or other open-weight alternatives.
  • No community benchmarks available: We do not have verified third-party benchmark results for this model. Published vendor metrics should be treated as best-case until independent measurements emerge.
  • Dense architecture limits throughput: Unlike MoE models that activate only a subset of parameters, all 8B parameters are used per token, which may result in lower throughput compared to similarly sized MoE models on the same hardware.
  • KV cache overhead at full context: At 128K tokens, the KV cache can add significant memory pressure. Expect to add 30–50% overhead beyond model weights when using long sequences.

What it takes to run this locally

Ministral 8B is firmly in the consumer deployment class. At FP16, the model requires 16 GB of disk space, which fits on a single 24 GB GPU (e.g., RTX 4090) with room for KV cache. For users with 12 GB GPUs, Q4_K_M (4.5 GB) or Q3_K_M (3.9 GB) are practical, leaving 7–8 GB for cache and overhead. At Q2_K (2.6 GB), even 8 GB GPUs can run the model with modest context lengths. Adding 30–50% for KV cache and framework overhead is recommended for typical use.

Should you run this locally?

Yes if you need a long-context model for research or non-commercial projects and have a consumer GPU with at least 8 GB of VRAM. The small quantized sizes make it easy to experiment with 128K context on modest hardware.

No if you require a commercial license, or if you need maximum throughput for production serving. In those cases, consider Mistral 7B v0.3 (Apache 2.0) or a larger MoE model for higher throughput per parameter.

Catalog cross-links

Overview

Mistral 8B with sliding-window attention and 128k context. Research license — Mistral 7B v0.3 is the commercial alternative.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
Ministral 3B Instruct3B
Edge
Family siblings (ministral)
Ministral 3B Instruct3B
Edge
Ministral 8B Instruct8B
You are here

Strengths

  • 128k context
  • Sliding-window attention

Weaknesses

  • Research license

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M5.0 GB8 GB

Get the model

HuggingFace

Original weights

huggingface.co/mistralai/Ministral-8B-Instruct-2410

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Ministral 8B Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Ministral 8B Instruct?

8GB of VRAM is enough to run Ministral 8B Instruct at the Q4_K_M quantization (file size 5.0 GB). Higher-quality quantizations need more.

Can I use Ministral 8B Instruct commercially?

Ministral 8B Instruct is released under the Mistral Research License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Ministral 8B Instruct?

Ministral 8B Instruct supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/mistralai/Ministral-8B-Instruct-2410

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Ministral 8B Instruct runs on your specific hardware before committing money.