other
32B parameters
Commercial OK
Reviewed May 2026

llm-jp 4 32B A3B Thinking

A 32B MoE model from Japan's National Institute of Informatics, with only 3B parameters active per forward pass. Trained on 11.7T tokens across four stages: pre-training, mid-training, SFT, and DPO. Targets Japanese and English conversational tasks with a 65K context window.

License: apache-2.0·Context: 65,536 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.1/10

If you need a Japanese-capable MoE that runs leaner than a dense 32B, this is a legitimate option from a credible academic source. The 11.7T token training and multi-stage alignment give it a serious foundation. That said, the vendor explicitly flags incomplete safety tuning, so keep it out of customer-facing or sensitive workflows for now. Hedge — worth evaluating for internal Japanese NLP tasks, but not a drop-in production pick yet.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.05/10. License is explicitly apache-2.0 on the HF card, commercial-OK flag is correct. Metadata is accurate: 32B total / ~3B active MoE, 65,536 context, llm-jp vendor, Japanese/English all verified from the card. Description is honest and operator-voiced, correctly flags the Harmony-template-but-different-tokenizer trap and incomplete safety alignment. Use case is reasonably specific (Japanese-English long-context) though could be sharper. Family 'other' is acceptable since the architecture is qwen3_moe but the model is llm-jp's own; family=qwen could also be defended. Clears the 9.0 bar narrowly.

Flags: - family='other' is defensible but qwen3_moe tag suggests 'qwen' family could also apply — minor consistency question - 11.7T token training claim is not visible in the included README excerpt; verify it appears elsewhere on the card before publishing

Overview

A 32B MoE model from Japan's National Institute of Informatics, with only 3B parameters active per forward pass. Trained on 11.7T tokens across four stages: pre-training, mid-training, SFT, and DPO. Targets Japanese and English conversational tasks with a 65K context window.

Strengths

  • MoE efficiency: 32B total params, 3B active — lower inference cost than a dense 32B
  • Extensive training: 11.7T tokens across pre-training, mid-training, SFT, and DPO
  • 65,536-token context window supports long documents and multi-turn sessions
  • Benchmarked on Japanese-specific evals (MT-Bench, AnswerCarefully)

Weaknesses

  • Safety alignment is described as incomplete — not production-ready for sensitive use cases
  • Custom tokenizer and chat template; expect friction with OpenAI-compatible tooling
  • Low download count (2,262) means limited community testing and real-world feedback
  • Full 32B weight footprint still loads into VRAM even though only 3B params are active per pass

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M17.6 GB23 GB

Get the model

HuggingFace

Original weights

huggingface.co/llm-jp/llm-jp-4-32b-a3b-thinking

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of llm-jp 4 32B A3B Thinking.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run llm-jp 4 32B A3B Thinking?

23GB of VRAM is enough to run llm-jp 4 32B A3B Thinking at the Q4_K_M quantization (file size 17.6 GB). Higher-quality quantizations need more.

Can I use llm-jp 4 32B A3B Thinking commercially?

Yes — llm-jp 4 32B A3B Thinking ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of llm-jp 4 32B A3B Thinking?

llm-jp 4 32B A3B Thinking supports a context window of 65,536 tokens (about 66K).

Source: huggingface.co/llm-jp/llm-jp-4-32b-a3b-thinking

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify llm-jp 4 32B A3B Thinking runs on your specific hardware before committing money.