
Nemotron 3 Ultra is the flagship large language model in NVIDIA's Nemotron 3 family, announced on December 15, 2025, with general availability expected in the first half of 2026. It features a massive 500 billion total parameters (with approximately 50 billion active parameters per token) and utilizes a novel Hybrid Mamba-Transformer Latent Mixture-of-Experts (MoE) architecture. This architecture, combining Mamba-2 state-space models with Transformer attention layers and Latent MoE routing, is designed for high efficiency and performance in complex, long-horizon tasks. With a 1 million token context window, Nemotron 3 Ultra is optimized for demanding enterprise AI applications such as deep research, strategic planning, and large-scale multi-agent coordination. It was trained using NVFP4 precision on NVIDIA Blackwell GPUs on a 3 trillion token dataset and incorporates Reinforcement Learning for agentic tasks. The model supports advanced reasoning, tool use, and is multilingual, with open weights released under the NVIDIA Open Model License.
Typetext
Parameters~500B (Total), ~50B (Active)