other
428B parameters
Commercial OK
Multimodal
Reviewed June 2026

MiniMax-M3

MiniMax-M3 is a native multimodal Mixture-of-Experts model from MiniMax (Hugging Face `MiniMaxAI/MiniMax-M3`, 2026-06), with ~428B total / ~23B active parameters per token. It accepts text, image, and video input and outputs text (a 60-layer MoE backbone with a CLIP-style vision encoder) and supports a 1,048,576-token (1M) context window. It is powered by MiniMax Sparse Attention (MSA), which the vendor reports delivers ~9x prefill and ~15x decode speedups versus M2 at 1M context (vendor-reported). Released under the MiniMax Community License — commercial use permitted with "Built with MiniMax M3" attribution (plus notice/authorization above $20M annual revenue). Benchmark figures on the card are vendor-reported and not independently verified.

License: MiniMax Community License·Released Jun 2, 2026·Context: 1,048,576 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 29, 2026
unrated

Positioning

MiniMax-M3 is the first open-weight model to combine frontier-level coding, a 1M-token context, and native multimodality (text + image + video in, text out) in one ~428B / ~23B-active MoE — released June 2026 under the MiniMax Community License (commercial use permitted with attribution).

What stands out

Its MiniMax Sparse Attention (MSA) is the standout: the vendor reports ~9x faster prefill and ~15x faster decode versus M2 at 1M context, and the technical report (arXiv 2606.13392) released a usable kernel plus a cheap dense-to-sparse conversion route. For local long-context serving that is a genuinely reusable advance, not just a model. Native video understanding plus desktop computer-use in an open model is rare.

Honest caveats

The benchmark figures are vendor-reported (the card ships them only as an image), and we have not reproduced them; a noted weak spot is ARC-AGI-2 (under 12%). The license is permissive but conditional — "Built with MiniMax M3" attribution, plus notice/authorization above $20M annual revenue. At ~428B total it is multi-GPU server-class to self-host via vLLM or SGLang.

Verdict

Run it if you want one open model that does long-context agentic coding and image/video understanding, and you can serve a 428B MoE or use a host. Skip if you only need text — a smaller text MoE is cheaper to run. The MSA kernel alone is worth a look for anyone building local 1M-context pipelines.

Overview

MiniMax-M3 is a native multimodal Mixture-of-Experts model from MiniMax (Hugging Face `MiniMaxAI/MiniMax-M3`, 2026-06), with ~428B total / ~23B active parameters per token. It accepts text, image, and video input and outputs text (a 60-layer MoE backbone with a CLIP-style vision encoder) and supports a 1,048,576-token (1M) context window. It is powered by MiniMax Sparse Attention (MSA), which the vendor reports delivers ~9x prefill and ~15x decode speedups versus M2 at 1M context (vendor-reported). Released under the MiniMax Community License — commercial use permitted with "Built with MiniMax M3" attribution (plus notice/authorization above $20M annual revenue). Benchmark figures on the card are vendor-reported and not independently verified.

Strengths

    Weaknesses

      Quantization variants

      Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

      QuantizationFile sizeVRAM required

      Get the model

      HuggingFace

      Original weights

      huggingface.co/MiniMaxAI/MiniMax-M3

      Source repository — direct quantization required.

      Hardware that runs this

      Cards with enough VRAM for at least one quantization of MiniMax-M3.

      Compare alternatives

      Models worth comparing

      Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

      Step up
      More capable — bigger memory footprint
      No verdicted models in the next tier up yet.

      Frequently asked

      Can I use MiniMax-M3 commercially?

      Yes — MiniMax-M3 ships under the MiniMax Community License, which permits commercial use. Always read the license text before deployment.

      What's the context length of MiniMax-M3?

      MiniMax-M3 supports a context window of 1,048,576 tokens (about 1049K).

      Does MiniMax-M3 support images?

      Yes — MiniMax-M3 is multimodal and accepts text + vision + video inputs. Vision support requires a runner that handles its image-conditioning architecture.

      Source: huggingface.co/MiniMaxAI/MiniMax-M3

      Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

      Related — keep moving

      Before you buy

      Verify MiniMax-M3 runs on your specific hardware before committing money.