MiniMax-M3

MiniMax-M3

MiniMax-M3 is a native multimodal Mixture-of-Experts model from MiniMax (Hugging Face `MiniMaxAI/MiniMax-M3`, 2026-06), with ~428B total / ~23B active parameters per token. It accepts text, image, and video input and outputs text (a 60-layer MoE backbone with a CLIP-style vision encoder) and supports a 1,048,576-token (1M) context window. It is powered by MiniMax Sparse Attention (MSA), which the vendor reports delivers ~9x prefill and ~15x decode speedups versus M2 at 1M context (vendor-reported). Released under the MiniMax Community License — commercial use permitted with "Built with MiniMax M3" attribution (plus notice/authorization above $20M annual revenue). Benchmark figures on the card are vendor-reported and not independently verified.

License: MiniMax Community License·Released Jun 2, 2026·Context: 1,048,576 tokens

Positioning

MiniMax-M3 is the first open-weight model to combine frontier-level coding, a 1M-token context, and native multimodality (text + image + video in, text out) in one ~428B / ~23B-active MoE — released June 2026 under the MiniMax Community License (commercial use permitted with attribution).

What stands out

Its MiniMax Sparse Attention (MSA) is the standout: the vendor reports ~9x faster prefill and ~15x faster decode versus M2 at 1M context, and the technical report (arXiv 2606.13392) released a usable kernel plus a cheap dense-to-sparse conversion route. For local long-context serving that is a genuinely reusable advance, not just a model. Native video understanding plus desktop computer-use in an open model is rare.

Honest caveats

The benchmark figures are vendor-reported (the card ships them only as an image), and we have not reproduced them; a noted weak spot is ARC-AGI-2 (under 12%). The license is permissive but conditional — "Built with MiniMax M3" attribution, plus notice/authorization above $20M annual revenue. At ~428B total it is multi-GPU server-class to self-host via vLLM or SGLang.

Verdict

Run it if you want one open model that does long-context agentic coding and image/video understanding, and you can serve a 428B MoE or use a host. Skip if you only need text — a smaller text MoE is cheaper to run. The MSA kernel alone is worth a look for anyone building local 1M-context pipelines.

Overview

Frequently asked

Can I use MiniMax-M3 commercially?

Yes — MiniMax-M3 ships under the MiniMax Community License, which permits commercial use. Always read the license text before deployment.

What's the context length of MiniMax-M3?

MiniMax-M3 supports a context window of 1,048,576 tokens (about 1049K).

Does MiniMax-M3 support images?

Yes — MiniMax-M3 is multimodal and accepts text + vision + video inputs. Vision support requires a runner that handles its image-conditioning architecture.

Our verdict

Positioning

What stands out

Honest caveats

Verdict

Overview

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

Can I use MiniMax-M3 commercially?

What's the context length of MiniMax-M3?

Does MiniMax-M3 support images?

Related — keep moving