qwen
32B parameters
Commercial OK
Reviewed May 2026

Qwen3 Swallow 32B RL v0.2

A 32B Japanese-English model built on Qwen3, trained with continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards. The RL stage targets math, coding, and general reasoning. This is v0.2, so the training pipeline has had at least one revision.

License: apache-2.0·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.0/10

If you need a commercially-licensed Japanese model that can actually handle math and code — not just chat — this is worth a look at the 32B tier. The RLVR training is a meaningful differentiator over plain SFT Swallow variants. That said, near-zero community adoption means you are largely on your own if something breaks. Skip if you need function calling or want a quantized option with low quality loss.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.00/10. License (apache-2.0) is explicit in the card and commercial use is correctly flagged. Vendor, family, and 32B param count are verified. Context length of 32768 is reasonable for Qwen3 base but not explicitly stated in the excerpt — minor verification gap. Description is honest and operator-voiced, correctly noting RLVR focus on math/code, the GPTQ deprecation issue (directly from card), and weak community adoption. Use case is sharp (Japanese-English math/code reasoning). Weaknesses are concrete and useful for a local-AI operator deciding whether to deploy.

Flags: - contextLength 32768 not explicitly confirmed in visible card excerpt — inherited from Qwen3 base assumption - Tool/function calling claim ('not explicitly supported') is an inference, not directly stated in excerpt

Overview

A 32B Japanese-English model built on Qwen3, trained with continual pre-training, supervised fine-tuning, and reinforcement learning with verifiable rewards. The RL stage targets math, coding, and general reasoning. This is v0.2, so the training pipeline has had at least one revision.

Strengths

  • Bilingual Japanese-English coverage on a capable 32B base
  • RLVR training specifically targets math and coding — not just chat quality
  • Apache-2.0 license, commercial use permitted
  • 32K context window

Weaknesses

  • Function calling / tool use not explicitly supported
  • No reasoning toggle — you cannot switch chain-of-thought off
  • GPTQ quantization deprecated by the vendor; quantized variants may underperform
  • Very low community traction so far (3K downloads, 1 like) — limited real-world feedback

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M17.6 GB23 GB

Get the model

HuggingFace

Original weights

huggingface.co/tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen3 Swallow 32B RL v0.2.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Qwen3 Swallow 32B RL v0.2?

23GB of VRAM is enough to run Qwen3 Swallow 32B RL v0.2 at the Q4_K_M quantization (file size 17.6 GB). Higher-quality quantizations need more.

Can I use Qwen3 Swallow 32B RL v0.2 commercially?

Yes — Qwen3 Swallow 32B RL v0.2 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen3 Swallow 32B RL v0.2?

Qwen3 Swallow 32B RL v0.2 supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Qwen3 Swallow 32B RL v0.2 runs on your specific hardware before committing money.