NVIDIA logo

Nemotron 3 Ultra

NVIDIAOpen WeightsPending Human Review

Nemotron 3 Ultra is the flagship large language model in NVIDIA's Nemotron 3 family, announced on December 15, 2025, with general availability expected in the first half of 2026. It features a massive 500 billion total parameters (with approximately 50 billion active parameters per token) and utilizes a novel Hybrid Mamba-Transformer Latent Mixture-of-Experts (MoE) architecture. This architecture, combining Mamba-2 state-space models with Transformer attention layers and Latent MoE routing, is designed for high efficiency and performance in complex, long-horizon tasks. With a 1 million token context window, Nemotron 3 Ultra is optimized for demanding enterprise AI applications such as deep research, strategic planning, and large-scale multi-agent coordination. It was trained using NVFP4 precision on NVIDIA Blackwell GPUs on a 3 trillion token dataset and incorporates Reinforcement Learning for agentic tasks. The model supports advanced reasoning, tool use, and is multilingual, with open weights released under the NVIDIA Open Model License.

2025-12-15
~500B (Total), ~50B (Active)
Hybrid Mamba-Transformer Latent Mixture-of-Experts (MoE)
NVIDIA Open Model License

Specifications

Parameters
~500B (Total), ~50B (Active)
Architecture
Hybrid Mamba-Transformer Latent Mixture-of-Experts (MoE)
License
NVIDIA Open Model License
Context Window
1,000,000 tokens
0
Training Data Cutoff
Unreleased
Type
text
Modalities
text

Benchmark Scores

Advanced Specifications

Model Family
Nemotron 3
Finetuned From
Pre-trained from scratch
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Variants
Nemotron 3 NanoNemotron 3 SuperNemotron 3 Ultra
Hardware Support
NVIDIA BlackwellNVIDIA H100NVIDIA B200

Capabilities & Limitations

Capabilities
reasoningagentic AItool usefunction callingmultilingualcodemathstrategic planningdeep researchmulti-token prediction
Known Limitations
High memory requirements due to model sizeOptimized for NVIDIA hardware
Notable Use Cases
deep researchstrategic planningcomplex multi-agent coordinationenterprise workflow automation
Function Calling Support
Yes
Tool Use Support
Yes

Related Models