Qwen 3 235B-A22B
Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference time. Best open-weight reasoning model for many tasks.
Positioning
Qwen 3 235B-A22B is the early-2025 frontier MoE that made the "frontier model on consumer-tier hardware" pitch credible. 235B total parameters with 22B active per token. The successor Qwen 3.5 235B-A17B shipped a few months later with marginal quality gains and a smaller per-token activation footprint — but Qwen 3 235B-A22B has more deployed-in-production weight, more battle-tested edge cases, and identical hardware footprint. The operator-grade question for our readers in 2026 isn't "is Qwen 3 235B-A22B good?" — yes — but "do I need to upgrade to 3.5, or is 3 still the right pick if I already have it loaded?"
Strengths
- Apache 2.0 licensed open weights — same permissive license as Qwen 3.5. Commercial use allowed, no MAU clauses, no Llama-style restrictions. Critical for many team deployments.
- Strong reasoning + coding combo at the frontier tier. Q3 235B-A22B benchmarks competitively against Llama 4 Scout, slightly behind Qwen 3.5 on most evals per Alibaba's own benchmarks; specific point spreads on MMLU-Pro / HumanEval / SWE-bench vary by run and are still being independently reproduced.
- Excellent multilingual support — Chinese, English, plus 60+ languages. Same Qwen-team-strength as 3.5.
- MoE architecture (~22B active per token) means decode is closer to a 22B dense model than a 235B dense one. Tok/s is reasonable on the Mac Studio M3 Ultra tier hardware.
- Production-deployed weight. Many teams that adopted Q3 in early-2025 are still running it because the upgrade path to 3.5 isn't obviously worth the redeploy churn.
Limitations
- Q3.5 supersedes it on quality. Marginal gains, but they exist. New deployments should pick 3.5; existing 3 deployments don't need to upgrade unless quality is the operative constraint.
- Memory still substantial. ~140 GB at Q4, ~110 GB at Q3 — same hardware footprint as 3.5. No 24-GB-card path. Mac Studio M3 Ultra 192 GB handles Q4 + 32K context. 128-GB tier handles Q4 with tight context.
- 22B activated per token (vs 17B on 3.5) means slightly slower decode than 3.5 at the same hardware footprint. Trade ~10-15% inference speed for the slightly older quality tier.
- Training data cutoff is older. Q3 was trained earlier — some 2025 events / APIs / library versions aren't reflected. For knowledge-recency tasks 3.5 wins.
Real-world performance on Mac Studio M3 Ultra (192 GB)
- Q4 (~140 GB): 10-15 tok/s decode, TTFT ~2-3s on 1K prompt. Slightly slower than Q3.5 235B-A17B (12-18 tok/s) at the same quant.
- Q3 (~110 GB): ~13-18 tok/s decode, faster TTFT, slight quality dip vs Q4.
- Q5 (~165 GB partial-offload): 7-~11 tok/s. Quality bump over Q4 is small; rarely worth the speed loss.
- Compare with: rented H100 80GB ×4 datacenter setup runs FP8 Q3 235B-A22B at ~70-100 tok/s — production target.
Should you run this locally?
Yes, if you already have Q3 235B-A22B running in production and the quality is acceptable for your workload. The redeploy cost to switch to 3.5 isn't always justified by the marginal gains. Stay on Q3 unless you have a specific reason to upgrade.
No, for new deployments where you have free choice. Pick Qwen 3.5 235B-A17B instead — same hardware footprint, slightly better quality, slightly faster decode (17B active vs 22B active).
Probably not, for anyone running a single consumer GPU — the hardware requirements (128 GB+ unified memory or workstation-tier) are the same as 3.5. If you can't run 3.5, you can't run 3.
Probably not, for anyone whose primary workload is coding (Qwen 2.5 Coder 32B at 24 GB beats 3 235B-A22B on coding-specific benchmarks at 1/6 the hardware cost).
How it compares
- vs Qwen 3.5 235B-A17B (successor) → 3.5 has marginal quality gains + 5B fewer active params per token (17B vs 22B) at the same hardware footprint. New deployments should pick 3.5; existing 3 deployments can stay until upgrade cost is justified. Pure incremental release.
- vs DeepSeek V4 Pro (1.6T MoE) → V4 Pro has higher quality ceiling but needs 192 GB+ hardware just to run at usable quants. Qwen 3 235B-A22B fits 128 GB + has Apache 2.0 license. For accessibility + license preference, Qwen 3 wins.
- vs Llama 4 Scout (Meta MoE) → similar quality tier. Llama 4 Scout has 128k effective context; Qwen 3 235B-A22B has 32k effective. Llama license has 700M MAU clause; Qwen 3 has Apache 2.0. Pick on context-length needs + license preference.
- vs DeepSeek R1 (671B reasoning specialist) → R1 specializes in reasoning, Q3 is generalist. For reasoning-only workloads R1 wins; for mixed daily-driver tasks Q3 235B-A22B is more useful.
- vs Qwen 3 30B-A3B (smaller MoE sibling) → 30B-A3B fits 24 GB consumer card — dramatically more accessible. Quality is meaningfully lower (closer to 8B-class than frontier). Pick the smaller MoE for "I want Qwen on a 4090"; pick 235B-A22B for "I have workstation hardware and want frontier."
Run this yourself
# Mac Studio M3 Ultra 192GB — Q4 fits comfortably
ollama pull qwen3:235b-a22b-q4_K_M
ollama run qwen3:235b-a22b-q4_K_M
# Or via llama.cpp directly:
llama-server -m qwen3-235b-a22b-Q4_K_M.gguf \
--ctx-size 32768 -ngl 999 --temp 0.7
Quant: Q4_K_M GGUF
Context: 32768 (KV cache f16, ~24 GB additional)
Backend: llama.cpp Metal via Ollama
Hardware: Mac Studio M3 Ultra 192 GB unified memory
Overview
Qwen 3 flagship MoE. 235B total / 22B active per token, with built-in 'thinking' and 'non-thinking' modes that trade speed for reasoning depth at inference time. Best open-weight reasoning model for many tasks.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Switchable thinking mode
- Apache 2.0
- Top-tier reasoning
Weaknesses
- Needs 160GB+ VRAM at Q4
- Multi-GPU only on consumer rigs
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 142.0 GB | 160 GB |
| Q5_K_M | 167.0 GB | 190 GB |
Get the model
Ollama
One-line install
ollama run qwen3:235bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Qwen 3 235B-A22B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Qwen 3 235B-A22B?
Can I use Qwen 3 235B-A22B commercially?
What's the context length of Qwen 3 235B-A22B?
How do I install Qwen 3 235B-A22B with Ollama?
Compare against other models
Curated head-to-head decisions where Qwen 3 235B-A22B is one of the contenders. For arbitrary pairings use /model-battle.
Source: huggingface.co/Qwen/Qwen3-235B-A22B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Qwen 3 235B-A22B runs on your specific hardware before committing money.