other
550B parameters
Commercial OK
Reviewed June 2026

Nemotron 3 Ultra (550B-A55B)

NVIDIA Nemotron 3 Ultra (550B-A55B) is a frontier-scale open-weight reasoning model from NVIDIA (Hugging Face `nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16`, 2026-06), with 550B total / 55B active parameters. Its "LatentMoE" architecture interleaves Mamba-2, MoE, and select attention layers with multi-token-prediction, trained on an NVFP4 recipe; it supports up to 1M-token context and 10 languages. Released under the Linux Foundation OpenMDW License v1.1 (commercial use permitted), text-only. Vendor-reported benchmarks (via NVIDIA's Nemo Evaluator SDK) include SWE-bench Verified 70.7, GPQA (no tools) 87.0, and RULER@1M 94.7 — not independently verified. The smaller Nemotron 3 Nano and Super tiers are far more practical to run on local hardware.

License: OpenMDW-1.1·Released Jun 4, 2026·Context: 1,000,000 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 29, 2026
unrated

Positioning

Nemotron 3 Ultra (550B-A55B) is NVIDIA's frontier-scale open-weight reasoning model (June 2026) — and notably the most capable open model from a US lab, released fully open (weights + recipes) under the Linux Foundation's OpenMDW-1.1 commercial license.

What stands out

The architecture is the interesting part: a "LatentMoE" hybrid interleaving Mamba-2, MoE, and select attention layers with multi-token prediction, trained on an NVFP4 recipe. The NVFP4 checkpoints make a 550B model unusually tractable on Blackwell hardware, and 1M context is supported (NVFP4-on-Blackwell; 262K in BF16). A fully-open US-lab model with permissive licensing is rare and valuable for teams that want provenance.

Honest caveats

All benchmarks are NVIDIA-reported (via their Nemo Evaluator SDK) and not independently verified. Per Artificial Analysis it still trails the leading Chinese open models on the intelligence index. At 550B it is datacenter-class; the smaller Nemotron 3 Super (120B) and Nemotron 3 Nano (30B) tiers are the locally-practical choices for most readers.

Verdict

Run it if you need a fully-open, commercially-licensed frontier reasoner with documented training recipes and you have Blackwell / NVFP4 infrastructure. Most local users should start with Nemotron 3 Super or Nano instead — same family, same openness, hardware you can actually buy.

Overview

NVIDIA Nemotron 3 Ultra (550B-A55B) is a frontier-scale open-weight reasoning model from NVIDIA (Hugging Face `nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16`, 2026-06), with 550B total / 55B active parameters. Its "LatentMoE" architecture interleaves Mamba-2, MoE, and select attention layers with multi-token-prediction, trained on an NVFP4 recipe; it supports up to 1M-token context and 10 languages. Released under the Linux Foundation OpenMDW License v1.1 (commercial use permitted), text-only. Vendor-reported benchmarks (via NVIDIA's Nemo Evaluator SDK) include SWE-bench Verified 70.7, GPQA (no tools) 87.0, and RULER@1M 94.7 — not independently verified. The smaller Nemotron 3 Nano and Super tiers are far more practical to run on local hardware.

Strengths

    Weaknesses

      Quantization variants

      Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

      QuantizationFile sizeVRAM required

      Get the model

      HuggingFace

      Original weights

      huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

      Source repository — direct quantization required.

      Hardware that runs this

      Cards with enough VRAM for at least one quantization of Nemotron 3 Ultra (550B-A55B).

      Compare alternatives

      Models worth comparing

      Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

      Step up
      More capable — bigger memory footprint
      No verdicted models in the next tier up yet.

      Frequently asked

      Can I use Nemotron 3 Ultra (550B-A55B) commercially?

      Yes — Nemotron 3 Ultra (550B-A55B) ships under the OpenMDW-1.1, which permits commercial use. Always read the license text before deployment.

      What's the context length of Nemotron 3 Ultra (550B-A55B)?

      Nemotron 3 Ultra (550B-A55B) supports a context window of 1,000,000 tokens (about 1000K).

      Source: huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

      Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

      Related — keep moving

      Before you buy

      Verify Nemotron 3 Ultra (550B-A55B) runs on your specific hardware before committing money.