Qwen 3 235B-A22B

Positioning

Qwen 3 235B-A22B is the early-2025 frontier MoE that made the "frontier model on consumer-tier hardware" pitch credible. 235B total parameters with 22B active per token. The successor Qwen 3.5 235B-A17B shipped a few months later with marginal quality gains and a smaller per-token activation footprint — but Qwen 3 235B-A22B has more deployed-in-production weight, more battle-tested edge cases, and identical hardware footprint. The operator-grade question for our readers in 2026 isn't "is Qwen 3 235B-A22B good?" — yes — but "do I need to upgrade to 3.5, or is 3 still the right pick if I already have it loaded?"

Strengths

Apache 2.0 licensed open weights — same permissive license as Qwen 3.5. Commercial use allowed, no MAU clauses, no Llama-style restrictions. Critical for many team deployments.
Strong reasoning + coding combo at the frontier tier. Q3 235B-A22B benchmarks competitively against Llama 4 Scout, slightly behind Qwen 3.5 on most evals per Alibaba's own benchmarks; specific point spreads on MMLU-Pro / HumanEval / SWE-bench vary by run and are still being independently reproduced.
Excellent multilingual support — Chinese, English, plus 60+ languages. Same Qwen-team-strength as 3.5.
MoE architecture (~22B active per token) means decode is closer to a 22B dense model than a 235B dense one. Tok/s is reasonable on the Mac Studio M3 Ultra tier hardware.
Production-deployed weight. Many teams that adopted Q3 in early-2025 are still running it because the upgrade path to 3.5 isn't obviously worth the redeploy churn.

Limitations

Q3.5 supersedes it on quality. Marginal gains, but they exist. New deployments should pick 3.5; existing 3 deployments don't need to upgrade unless quality is the operative constraint.
Memory still substantial. ~140 GB at Q4, ~110 GB at Q3 — same hardware footprint as 3.5. No 24-GB-card path. Mac Studio M3 Ultra 192 GB handles Q4 + 32K context. 128-GB tier handles Q4 with tight context.
22B activated per token (vs 17B on 3.5) means slightly slower decode than 3.5 at the same hardware footprint. Trade ~10-15% inference speed for the slightly older quality tier.
Training data cutoff is older. Q3 was trained earlier — some 2025 events / APIs / library versions aren't reflected. For knowledge-recency tasks 3.5 wins.

Real-world performance on Mac Studio M3 Ultra (192 GB)

Q4 (~140 GB): 10-15 tok/s decode, TTFT ~2-3s on 1K prompt. Slightly slower than Q3.5 235B-A17B (12-18 tok/s) at the same quant.
Q3 (~110 GB): ~13-18 tok/s decode, faster TTFT, slight quality dip vs Q4.
Q5 (~165 GB partial-offload): 7-~11 tok/s. Quality bump over Q4 is small; rarely worth the speed loss.
Compare with: rented H100 80GB ×4 datacenter setup runs FP8 Q3 235B-A22B at ~70-100 tok/s — production target.

Should you run this locally?

Yes, if you already have Q3 235B-A22B running in production and the quality is acceptable for your workload. The redeploy cost to switch to 3.5 isn't always justified by the marginal gains. Stay on Q3 unless you have a specific reason to upgrade.

No, for new deployments where you have free choice. Pick Qwen 3.5 235B-A17B instead — same hardware footprint, slightly better quality, slightly faster decode (17B active vs 22B active).

Probably not, for anyone running a single consumer GPU — the hardware requirements (128 GB+ unified memory or workstation-tier) are the same as 3.5. If you can't run 3.5, you can't run 3.

Probably not, for anyone whose primary workload is coding (Qwen 2.5 Coder 32B at 24 GB beats 3 235B-A22B on coding-specific benchmarks at 1/6 the hardware cost).

How it compares

vs Qwen 3.5 235B-A17B (successor) → 3.5 has marginal quality gains + 5B fewer active params per token (17B vs 22B) at the same hardware footprint. New deployments should pick 3.5; existing 3 deployments can stay until upgrade cost is justified. Pure incremental release.
vs DeepSeek V4 Pro (1.6T MoE) → V4 Pro has higher quality ceiling but needs 192 GB+ hardware just to run at usable quants. Qwen 3 235B-A22B fits 128 GB + has Apache 2.0 license. For accessibility + license preference, Qwen 3 wins.
vs Llama 4 Scout (Meta MoE) → similar quality tier. Llama 4 Scout has 128k effective context; Qwen 3 235B-A22B has 32k effective. Llama license has 700M MAU clause; Qwen 3 has Apache 2.0. Pick on context-length needs + license preference.
vs DeepSeek R1 (671B reasoning specialist) → R1 specializes in reasoning, Q3 is generalist. For reasoning-only workloads R1 wins; for mixed daily-driver tasks Q3 235B-A22B is more useful.
vs Qwen 3 30B-A3B (smaller MoE sibling) → 30B-A3B fits 24 GB consumer card — dramatically more accessible. Quality is meaningfully lower (closer to 8B-class than frontier). Pick the smaller MoE for "I want Qwen on a 4090"; pick 235B-A22B for "I have workstation hardware and want frontier."

Run this yourself

# Mac Studio M3 Ultra 192GB — Q4 fits comfortably
ollama pull qwen3:235b-a22b-q4_K_M
ollama run qwen3:235b-a22b-q4_K_M

# Or via llama.cpp directly:
llama-server -m qwen3-235b-a22b-Q4_K_M.gguf \
  --ctx-size 32768 -ngl 999 --temp 0.7

Quant: Q4_K_M GGUF Context: 32768 (KV cache f16, ~24 GB additional) Backend: llama.cpp Metal via Ollama Hardware: Mac Studio M3 Ultra 192 GB unified memory

Quantization	File size	VRAM required
Q4_K_M	142.0 GB	160 GB
Q5_K_M	167.0 GB	190 GB

Quantization

File size

VRAM required

Q4_K_M

142.0 GB

160 GB

Q5_K_M

167.0 GB

190 GB

Frequently asked

What's the minimum VRAM to run Qwen 3 235B-A22B?

160GB of VRAM is enough to run Qwen 3 235B-A22B at the Q4_K_M quantization (file size 142.0 GB). Higher-quality quantizations need more.

Can I use Qwen 3 235B-A22B commercially?

Yes — Qwen 3 235B-A22B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 3 235B-A22B?

Qwen 3 235B-A22B supports a context window of 131,072 tokens (about 131K).

How do I install Qwen 3 235B-A22B with Ollama?

Run `ollama pull qwen3:235b` to download, then `ollama run qwen3:235b` to start a chat session. The default quantization is Q4_K_M.

Our verdict

Positioning

Strengths

Limitations

Real-world performance on Mac Studio M3 Ultra (192 GB)

Should you run this locally?

How it compares

Run this yourself

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Qwen 3 235B-A22B?

Can I use Qwen 3 235B-A22B commercially?

What's the context length of Qwen 3 235B-A22B?

How do I install Qwen 3 235B-A22B with Ollama?

Compare against other models

Related — keep moving