NVIDIA Nemotron Nano 9B v2 Japanese
A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both reasoning and standard inference modes. Commercial use is permitted under the NVIDIA Nemotron Open Model License.
If you need a Japanese-capable model that fits in modest VRAM and handles long context, this is one of the more practical options at 9B. The hybrid architecture gives you real efficiency gains, and the Japanese tool-calling fine-tune is a meaningful differentiator over generic multilingual models. That said, keep reasoning mode on for anything non-trivial — the accuracy gap without it is real. Skip this if your workload is code-heavy or requires reliable complex reasoning; the SWE-Bench number and size ceiling will hurt you.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.18/10. License claim is correct and verified against the HF card (NVIDIA Nemotron Open Model License, commercial use explicitly permitted in the README). Metadata is accurate: 9B params, hybrid Mamba2-Transformer/Nemotron-H architecture, Japanese specialization all match. Editorial voice is operator-grade — concrete weaknesses (SWE-Bench 0.025, reasoning-off accuracy drop, license caveat) and a sharp best-use-case (Japanese tool-calling + long context). One minor concern: the 131K context is plausible for Nemotron-Nano-9B-v2 base but isn't explicitly confirmed in the excerpt shown. Family 'other' is a reasonable choice given the hybrid Nemotron-H architecture. Clears the 9.0 bar.
Flags: - 131K context length not explicitly confirmed in the README excerpt — worth a quick double-check against the base model card
Overview
A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both reasoning and standard inference modes. Commercial use is permitted under the NVIDIA Nemotron Open Model License.
Strengths
- 131K token context window — usable for long documents and conversations
- Hybrid Mamba2-Transformer architecture reduces memory overhead vs. pure-attention models at this size
- Explicitly trained on Japanese tool-calling data; scores competitively on Nejumi Leaderboard
- Commercial use allowed out of the box
Weaknesses
- Accuracy drops noticeably on hard prompts when reasoning traces are disabled
- Weak on software engineering tasks — SWE-Bench score of 0.025 is low
- 9B parameters is a real ceiling; complex multi-step reasoning will hit limits
- NVIDIA Nemotron license is not Apache/MIT — read it before deploying
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 5.0 GB | 7 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of NVIDIA Nemotron Nano 9B v2 Japanese.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run NVIDIA Nemotron Nano 9B v2 Japanese?
Can I use NVIDIA Nemotron Nano 9B v2 Japanese commercially?
What's the context length of NVIDIA Nemotron Nano 9B v2 Japanese?
Source: huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify NVIDIA Nemotron Nano 9B v2 Japanese runs on your specific hardware before committing money.