mistral
3B parameters
Restricted
Reviewed June 2026

Ministral 3B Instruct

Mistral edge model at 3B. Designed for on-device inference with extended 128k context. Research license only.

License: Mistral Research License·Released Oct 16, 2024·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Ministral 3B Instruct is Mistral AI's edge-tier dense model, packing 3 billion parameters into a compact footprint optimized for on-device inference. Released under the Mistral Research License, it is explicitly intended for research use only, not commercial deployment. Its standout architectural feature is a 131,072-token context window — an extraordinary length for a model of this size, making it a unique entry in the open-weight landscape for long-context experimentation on resource-constrained hardware.

Strengths

  • Extreme context length for its size: With 128K tokens of context, Ministral 3B offers a context-to-parameter ratio that is among the highest available, enabling tasks like long-document analysis or multi-turn conversations that would typically require much larger models.

  • Tiny quantized footprint: At Q4_K_M the model occupies only ~1.7 GB on disk, and at Q2_K just ~1.0 GB. This makes it feasible to run on devices with as little as 2–4 GB of RAM, including phones, Raspberry Pi-class hardware, or low-end laptops.

  • Designed for edge inference: As a dense 3B model, it avoids the memory overhead of MoE architectures and can be loaded entirely in CPU memory or a small GPU, making it a practical choice for offline or privacy-sensitive applications.

  • Research-friendly license: The Mistral Research License permits academic and non-commercial experimentation, allowing researchers to explore long-context techniques without licensing costs.

Limitations

  • Research-only license: The Mistral Research License explicitly prohibits commercial use. Any production deployment or integration into a commercial product is not permitted without a separate agreement.

  • No community benchmarks available: We do not yet have independently verified benchmark scores for this model. Operators considering it should treat any published vendor metrics as best-case and should conduct their own evaluations for their specific use cases.

  • Small parameter count limits raw capability: With only 3B parameters, the model's reasoning, factual recall, and instruction-following ability are inherently constrained compared to larger models. It is best suited for narrow, well-defined tasks rather than general-purpose reasoning.

  • KV cache overhead at full context: While the model itself is small, running at the full 128K context window requires significant additional memory for the KV cache. At FP16, the cache alone can exceed the model size, pushing total memory requirements beyond what typical edge devices can provide.

What it takes to run this locally

Ministral 3B is exceptionally lightweight. Quantized sizes range from ~6 GB (FP16) down to ~1.0 GB (Q2_K). For typical use with moderate context (e.g., 8K–32K tokens), add 30–50% overhead for KV cache and framework memory. This places the model firmly in the consumer/edge deployment class: it can run on a single consumer GPU with 4–8 GB VRAM, or even on CPU-only devices with 4+ GB RAM. No workstation or datacenter hardware is required.

Should you run this locally?

Yes if you are a researcher exploring long-context inference on edge hardware, need a model that fits in under 2 GB of memory, or want to experiment with Mistral's architecture in a non-commercial setting. No if you need commercial deployment rights, require strong general-purpose reasoning, or plan to use the full 128K context on a device with less than 8 GB of unified memory.

Catalog cross-links

  • Mistral 7B
  • Mistral Research License overview

Overview

Mistral edge model at 3B. Designed for on-device inference with extended 128k context. Research license only.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (ministral)
Ministral 3B Instruct3B
You are here
Ministral 8B Instruct8B
Consumer
Distilled / fine-tuned from this

Strengths

  • 128k context at 3B
  • Edge deployable

Weaknesses

  • Research license blocks commercial use

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M1.9 GB4 GB

Get the model

HuggingFace

Original weights

huggingface.co/mistralai/Ministral-3B-Instruct-2410

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Ministral 3B Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Ministral 3B Instruct?

4GB of VRAM is enough to run Ministral 3B Instruct at the Q4_K_M quantization (file size 1.9 GB). Higher-quality quantizations need more.

Can I use Ministral 3B Instruct commercially?

Ministral 3B Instruct is released under the Mistral Research License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Ministral 3B Instruct?

Ministral 3B Instruct supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/mistralai/Ministral-3B-Instruct-2410

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Ministral 3B Instruct runs on your specific hardware before committing money.