other
9B parameters
Commercial OK
Reviewed May 2026

NVIDIA Nemotron Nano 9B v2 Japanese

A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both reasoning and standard inference modes. Commercial use is permitted under the NVIDIA Nemotron Open Model License.

License: NVIDIA Nemotron Open Model License·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.2/10

If you need a Japanese-capable model that fits in modest VRAM and handles long context, this is one of the more practical options at 9B. The hybrid architecture gives you real efficiency gains, and the Japanese tool-calling fine-tune is a meaningful differentiator over generic multilingual models. That said, keep reasoning mode on for anything non-trivial — the accuracy gap without it is real. Skip this if your workload is code-heavy or requires reliable complex reasoning; the SWE-Bench number and size ceiling will hurt you.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.18/10. License claim is correct and verified against the HF card (NVIDIA Nemotron Open Model License, commercial use explicitly permitted in the README). Metadata is accurate: 9B params, hybrid Mamba2-Transformer/Nemotron-H architecture, Japanese specialization all match. Editorial voice is operator-grade — concrete weaknesses (SWE-Bench 0.025, reasoning-off accuracy drop, license caveat) and a sharp best-use-case (Japanese tool-calling + long context). One minor concern: the 131K context is plausible for Nemotron-Nano-9B-v2 base but isn't explicitly confirmed in the excerpt shown. Family 'other' is a reasonable choice given the hybrid Nemotron-H architecture. Clears the 9.0 bar.

Flags: - 131K context length not explicitly confirmed in the README excerpt — worth a quick double-check against the base model card

Overview

A 9B hybrid Mamba2-Transformer model fine-tuned from Nemotron-Nano-9B-v2 on Japanese tool-calling data. Handles up to 131K tokens of context and supports both reasoning and standard inference modes. Commercial use is permitted under the NVIDIA Nemotron Open Model License.

Strengths

  • 131K token context window — usable for long documents and conversations
  • Hybrid Mamba2-Transformer architecture reduces memory overhead vs. pure-attention models at this size
  • Explicitly trained on Japanese tool-calling data; scores competitively on Nejumi Leaderboard
  • Commercial use allowed out of the box

Weaknesses

  • Accuracy drops noticeably on hard prompts when reasoning traces are disabled
  • Weak on software engineering tasks — SWE-Bench score of 0.025 is low
  • 9B parameters is a real ceiling; complex multi-step reasoning will hit limits
  • NVIDIA Nemotron license is not Apache/MIT — read it before deploying

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M5.0 GB7 GB

Get the model

HuggingFace

Original weights

huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of NVIDIA Nemotron Nano 9B v2 Japanese.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run NVIDIA Nemotron Nano 9B v2 Japanese?

7GB of VRAM is enough to run NVIDIA Nemotron Nano 9B v2 Japanese at the Q4_K_M quantization (file size 5.0 GB). Higher-quality quantizations need more.

Can I use NVIDIA Nemotron Nano 9B v2 Japanese commercially?

Yes — NVIDIA Nemotron Nano 9B v2 Japanese ships under the NVIDIA Nemotron Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of NVIDIA Nemotron Nano 9B v2 Japanese?

NVIDIA Nemotron Nano 9B v2 Japanese supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify NVIDIA Nemotron Nano 9B v2 Japanese runs on your specific hardware before committing money.