Mixtral 8x22B Instruct

Positioning

Mixtral 8x22B is the heavyweight Mixtral release — 141B total parameters, 39B active per token. Closer to a real flagship than 8x7B was, but the disk and memory footprint pushes it past consumer rigs into workstation territory. Largely superseded by Llama 4 Scout for the same hardware tier.

Strengths

Apache 2.0 license — license-clean alternative to Llama 4 in the workstation-class MoE space.
39B active per token keeps tok/s competitive with dense ~40B models.
Strong multilingual — Mistral's European focus carries through.

Limitations

Workstation hardware required — 84 GB at Q4_K_M, partial-offload only on 24 GB cards.
Quality has been overtaken by Llama 4 Scout and DeepSeek V3 for similar memory.
Long context is weaker than the spec implies — recall degrades past 24K.

Real-world performance on RTX 4090

Q4_K_M (84 GB) — heavy offload: 7–12 tok/s, ~64 GB+ system RAM required
Q5_K_M (97 GB) — workstation only
Q8_0 (141 GB) — multi-card workstation

Should you run this locally?

Yes, for workstation rigs where Apache-license MoE matters more than absolute capability — and for legacy Mixtral fine-tunes already in use. No, for new deployments — Llama 4 Scout or DeepSeek V3 are the better picks at similar hardware investment.

How it compares

vs Llama 4 Scout → similar memory footprint; Scout wins on multimodality + architecture sophistication. New work tilts toward Scout.
vs Mixtral 8x7B → 8x22B is a legitimate flagship where 8x7B was a tech demo. If MoE is the goal, 8x22B is the only Mixtral worth running today.
vs DeepSeek V3 → V3 has more total params but very strong active-param efficiency; V3 wins on quality, Mixtral 8x22B wins on license clarity.

Run this yourself

ollama pull mixtral:8x22b-instruct-v0.1-q4_K_M
ollama run mixtral:8x22b-instruct-v0.1-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, --n-gpu-layers ~30, RTX 4090 + 96 GB DDR5

Featured in this stack

The L3 execution stacks that pick this model as a recommended component, with the one-line note explaining the role it plays in each.

Stack · L3·Homelab tier·Role: Large MoE model (39B-active, 141B total)

Quad RTX 3090 workstation stack — the prosumer 100B-class ceiling

Mixtral 8x22B at AWQ-INT4 fits across 88 GB effective with comfortable headroom. Expert routing across 4 cards is bandwidth-friendlier than dense tensor-parallel — the no-NVLink penalty between paired cards shrinks for MoE.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (mixtral)

Mixtral 8x7B Instruct47B

Workstation

Mixtral 8x22B Instruct141B

You are here

Quantization	File size	VRAM required
Q4_K_M	84.0 GB	96 GB

Quantization

File size

VRAM required

Q4_K_M

84.0 GB

96 GB

Frequently asked

What's the minimum VRAM to run Mixtral 8x22B Instruct?

96GB of VRAM is enough to run Mixtral 8x22B Instruct at the Q4_K_M quantization (file size 84.0 GB). Higher-quality quantizations need more.

Can I use Mixtral 8x22B Instruct commercially?

Yes — Mixtral 8x22B Instruct ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Mixtral 8x22B Instruct?

Mixtral 8x22B Instruct supports a context window of 65,536 tokens (about 66K).

How do I install Mixtral 8x22B Instruct with Ollama?

Run `ollama pull mixtral:8x22b` to download, then `ollama run mixtral:8x22b` to start a chat session. The default quantization is Q4_K_M.

Our verdict

Positioning

Strengths

Limitations

Real-world performance on RTX 4090

Should you run this locally?

How it compares

Run this yourself

Overview

Featured in this stack

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Mixtral 8x22B Instruct?

Can I use Mixtral 8x22B Instruct commercially?

What's the context length of Mixtral 8x22B Instruct?

How do I install Mixtral 8x22B Instruct with Ollama?

Related — keep moving