qwen
235B parameters
Commercial OK
Reviewed June 2026

Qwen 3 235B-A22B

Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference time. Best open-weight reasoning model for many tasks.

License: Apache 2.0·Released Apr 29, 2025·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Qwen 3 235B-A22B is the early-2025 frontier MoE that made the "frontier model on consumer-tier hardware" pitch credible. 235B total parameters with 22B active per token. The successor Qwen 3.5 235B-A17B shipped a few months later with marginal quality gains and a smaller per-token activation footprint — but Qwen 3 235B-A22B has more deployed-in-production weight, more battle-tested edge cases, and identical hardware footprint. The operator-grade question for our readers in 2026 isn't "is Qwen 3 235B-A22B good?" — yes — but "do I need to upgrade to 3.5, or is 3 still the right pick if I already have it loaded?"

Strengths

  • Apache 2.0 licensed open weights — same permissive license as Qwen 3.5. Commercial use allowed, no MAU clauses, no Llama-style restrictions. Critical for many team deployments.
  • Strong reasoning + coding combo at the frontier tier. Q3 235B-A22B benchmarks competitively against Llama 4 Scout, slightly behind Qwen 3.5 on most evals per Alibaba's own benchmarks; specific point spreads on MMLU-Pro / HumanEval / SWE-bench vary by run and are still being independently reproduced.
  • Excellent multilingual support — Chinese, English, plus 60+ languages. Same Qwen-team-strength as 3.5.
  • MoE architecture (~22B active per token) means decode is closer to a 22B dense model than a 235B dense one. Tok/s is reasonable on the Mac Studio M3 Ultra tier hardware.
  • Production-deployed weight. Many teams that adopted Q3 in early-2025 are still running it because the upgrade path to 3.5 isn't obviously worth the redeploy churn.

Limitations

  • Q3.5 supersedes it on quality. Marginal gains, but they exist. New deployments should pick 3.5; existing 3 deployments don't need to upgrade unless quality is the operative constraint.
  • Memory still substantial. ~140 GB at Q4, ~110 GB at Q3 — same hardware footprint as 3.5. No 24-GB-card path. Mac Studio M3 Ultra 192 GB handles Q4 + 32K context. 128-GB tier handles Q4 with tight context.
  • 22B activated per token (vs 17B on 3.5) means slightly slower decode than 3.5 at the same hardware footprint. Trade ~10-15% inference speed for the slightly older quality tier.
  • Training data cutoff is older. Q3 was trained earlier — some 2025 events / APIs / library versions aren't reflected. For knowledge-recency tasks 3.5 wins.

Real-world performance on Mac Studio M3 Ultra (192 GB)

  • Q4 (~140 GB): 10-15 tok/s decode, TTFT ~2-3s on 1K prompt. Slightly slower than Q3.5 235B-A17B (12-18 tok/s) at the same quant.
  • Q3 (~110 GB): ~13-18 tok/s decode, faster TTFT, slight quality dip vs Q4.
  • Q5 (~165 GB partial-offload): 7-~11 tok/s. Quality bump over Q4 is small; rarely worth the speed loss.
  • Compare with: rented H100 80GB ×4 datacenter setup runs FP8 Q3 235B-A22B at ~70-100 tok/s — production target.

Should you run this locally?

Yes, if you already have Q3 235B-A22B running in production and the quality is acceptable for your workload. The redeploy cost to switch to 3.5 isn't always justified by the marginal gains. Stay on Q3 unless you have a specific reason to upgrade.

No, for new deployments where you have free choice. Pick Qwen 3.5 235B-A17B instead — same hardware footprint, slightly better quality, slightly faster decode (17B active vs 22B active).

Probably not, for anyone running a single consumer GPU — the hardware requirements (128 GB+ unified memory or workstation-tier) are the same as 3.5. If you can't run 3.5, you can't run 3.

Probably not, for anyone whose primary workload is coding (Qwen 2.5 Coder 32B at 24 GB beats 3 235B-A22B on coding-specific benchmarks at 1/6 the hardware cost).

How it compares

  • vs Qwen 3.5 235B-A17B (successor) → 3.5 has marginal quality gains + 5B fewer active params per token (17B vs 22B) at the same hardware footprint. New deployments should pick 3.5; existing 3 deployments can stay until upgrade cost is justified. Pure incremental release.
  • vs DeepSeek V4 Pro (1.6T MoE) → V4 Pro has higher quality ceiling but needs 192 GB+ hardware just to run at usable quants. Qwen 3 235B-A22B fits 128 GB + has Apache 2.0 license. For accessibility + license preference, Qwen 3 wins.
  • vs Llama 4 Scout (Meta MoE) → similar quality tier. Llama 4 Scout has 128k effective context; Qwen 3 235B-A22B has 32k effective. Llama license has 700M MAU clause; Qwen 3 has Apache 2.0. Pick on context-length needs + license preference.
  • vs DeepSeek R1 (671B reasoning specialist) → R1 specializes in reasoning, Q3 is generalist. For reasoning-only workloads R1 wins; for mixed daily-driver tasks Q3 235B-A22B is more useful.
  • vs Qwen 3 30B-A3B (smaller MoE sibling) → 30B-A3B fits 24 GB consumer card — dramatically more accessible. Quality is meaningfully lower (closer to 8B-class than frontier). Pick the smaller MoE for "I want Qwen on a 4090"; pick 235B-A22B for "I have workstation hardware and want frontier."

Run this yourself

# Mac Studio M3 Ultra 192GB — Q4 fits comfortably
ollama pull qwen3:235b-a22b-q4_K_M
ollama run qwen3:235b-a22b-q4_K_M

# Or via llama.cpp directly:
llama-server -m qwen3-235b-a22b-Q4_K_M.gguf \
  --ctx-size 32768 -ngl 999 --temp 0.7
Quant: Q4_K_M GGUF Context: 32768 (KV cache f16, ~24 GB additional) Backend: llama.cpp Metal via Ollama Hardware: Mac Studio M3 Ultra 192 GB unified memory

Overview

Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference time. Best open-weight reasoning model for many tasks.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Distilled / fine-tuned from this

Strengths

  • Switchable thinking mode
  • Apache 2.0
  • Top-tier reasoning

Weaknesses

  • Needs 160GB+ VRAM at Q4
  • Multi-GPU only on consumer rigs

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M142.0 GB160 GB
Q5_K_M167.0 GB190 GB

Get the model

Ollama

One-line install

ollama run qwen3:235bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Qwen/Qwen3-235B-A22B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen 3 235B-A22B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run Qwen 3 235B-A22B?

160GB of VRAM is enough to run Qwen 3 235B-A22B at the Q4_K_M quantization (file size 142.0 GB). Higher-quality quantizations need more.

Can I use Qwen 3 235B-A22B commercially?

Yes — Qwen 3 235B-A22B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 3 235B-A22B?

Qwen 3 235B-A22B supports a context window of 131,072 tokens (about 131K).

How do I install Qwen 3 235B-A22B with Ollama?

Run `ollama pull qwen3:235b` to download, then `ollama run qwen3:235b` to start a chat session. The default quantization is Q4_K_M.

Compare against other models

Curated head-to-head decisions where Qwen 3 235B-A22B is one of the contenders. For arbitrary pairings use /model-battle.

Source: huggingface.co/Qwen/Qwen3-235B-A22B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Qwen 3 235B-A22B runs on your specific hardware before committing money.