MiniMax-M3
MiniMax-M3 is a native multimodal Mixture-of-Experts model from MiniMax (Hugging Face `MiniMaxAI/MiniMax-M3`, 2026-06), with ~428B total / ~23B active parameters per token. It accepts text, image, and video input and outputs text (a 60-layer MoE backbone with a CLIP-style vision encoder) and supports a 1,048,576-token (1M) context window. It is powered by MiniMax Sparse Attention (MSA), which the vendor reports delivers ~9x prefill and ~15x decode speedups versus M2 at 1M context (vendor-reported). Released under the MiniMax Community License — commercial use permitted with "Built with MiniMax M3" attribution (plus notice/authorization above $20M annual revenue). Benchmark figures on the card are vendor-reported and not independently verified.
Positioning
MiniMax-M3 is the first open-weight model to combine frontier-level coding, a 1M-token context, and native multimodality (text + image + video in, text out) in one ~428B / ~23B-active MoE — released June 2026 under the MiniMax Community License (commercial use permitted with attribution).
What stands out
Its MiniMax Sparse Attention (MSA) is the standout: the vendor reports ~9x faster prefill and ~15x faster decode versus M2 at 1M context, and the technical report (arXiv 2606.13392) released a usable kernel plus a cheap dense-to-sparse conversion route. For local long-context serving that is a genuinely reusable advance, not just a model. Native video understanding plus desktop computer-use in an open model is rare.
Honest caveats
The benchmark figures are vendor-reported (the card ships them only as an image), and we have not reproduced them; a noted weak spot is ARC-AGI-2 (under 12%). The license is permissive but conditional — "Built with MiniMax M3" attribution, plus notice/authorization above $20M annual revenue. At ~428B total it is multi-GPU server-class to self-host via vLLM or SGLang.
Verdict
Run it if you want one open model that does long-context agentic coding and image/video understanding, and you can serve a 428B MoE or use a host. Skip if you only need text — a smaller text MoE is cheaper to run. The MSA kernel alone is worth a look for anyone building local 1M-context pipelines.
Overview
MiniMax-M3 is a native multimodal Mixture-of-Experts model from MiniMax (Hugging Face `MiniMaxAI/MiniMax-M3`, 2026-06), with ~428B total / ~23B active parameters per token. It accepts text, image, and video input and outputs text (a 60-layer MoE backbone with a CLIP-style vision encoder) and supports a 1,048,576-token (1M) context window. It is powered by MiniMax Sparse Attention (MSA), which the vendor reports delivers ~9x prefill and ~15x decode speedups versus M2 at 1M context (vendor-reported). Released under the MiniMax Community License — commercial use permitted with "Built with MiniMax M3" attribution (plus notice/authorization above $20M annual revenue). Benchmark figures on the card are vendor-reported and not independently verified.
Strengths
Weaknesses
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of MiniMax-M3.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
Can I use MiniMax-M3 commercially?
What's the context length of MiniMax-M3?
Does MiniMax-M3 support images?
Source: huggingface.co/MiniMaxAI/MiniMax-M3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify MiniMax-M3 runs on your specific hardware before committing money.