Qwen 3.6 (35B-A3B / 27B with MTP) vs Qwen 3 32B — should I upgrade?
The answer
One paragraph. No hedging beyond what the data actually warrants.
Wait if you're on Ollama. Upgrade if you're on a current vLLM build that ships MTP support.
Qwen 3.6 ships with Multi-Token Prediction (MTP) — the model predicts multiple tokens per forward pass, materially boosting throughput. Combined with the 35B-A3B MoE architecture (3B activated parameters per token), it produces faster generation than Qwen 3 32B on supported runtimes.
The catch: MTP needs runtime support. Before upgrading, confirm against the actual release notes of the runtime you're on — version numbers move fast and we'd rather you check than trust a stale pin here:
- ✅ vLLM (current builds) ship MTP support. Real throughput gains are visible in community runs.
- ✅ llama.cpp (recent builds, post-MTP merge) have MTP on both CPU and GPU paths.
- ⏳ Ollama wraps llama.cpp but historically lags upstream by weeks. Check Ollama's GitHub releases for "MTP" or "multi-token" before assuming you'll see the throughput uplift.
- ⏳ TensorRT-LLM has it as a first-class feature (NVIDIA's reference path).
Without MTP, the comparison flips: Qwen 3 32B (dense) beats Qwen 3.6 35B-A3B (MoE) on raw quality at the same Q4_K_M quant. The MoE activated-param dance is throughput optimization, not quality improvement.
Decision rule:
- vLLM with MTP → upgrade to Qwen 3.6 35B-A3B. Real throughput win.
- llama.cpp recent builds → upgrade if you're CPU-bound; the MTP gains are runtime-dependent.
- Ollama users → stay on Qwen 3 32B until your Ollama build clearly lists MTP support in its release notes. The Qwen 3.6 GGUFs work but you may be missing the headline feature.
Explore the numbers for your specific stack
Where we got the numbers
MTP support: vLLM v0.20.0 release notes; llama.cpp PR thread #5742 (multi-token prediction). Ollama support tracking via r/ollama and GitHub issues.
Also see
Editorial verdict, runtime requirements, how-to-run guidance.
The current support matrix across vLLM, llama.cpp, Ollama, MLX.
The current workhorse — for comparison.
Date-sorted model tracker — see what else dropped this week.
Other questions in this thread
Other /q/ landings on the same topic — same editorial discipline.
Found this via a forum search? Bookmark the URL — we update these pages as new data lands. Have a question that should live here? Open a GitHub issue.