NVIDIA has released Nemotron 3 Ultra, a new open-weight AI model built for long-running agent workflows, complex reasoning tasks, and large-scale enterprise deployments. The model features a 550-billion-parameter mixture-of-experts architecture with 55 billion active parameters and supports context windows of up to one million tokens.
Available through Hugging Face and NVIDIA’s developer ecosystem, Nemotron 3 Ultra is the largest and most capable model in the Nemotron 3 family. NVIDIA said the model was designed to help AI agents complete multi-step tasks more efficiently while reducing inference costs associated with extended reasoning and orchestration workloads.
The release reflects NVIDIA’s continued expansion beyond AI infrastructure and into the model layer. In addition to publishing model weights, the company has released related datasets, training resources, and deployment tools, allowing developers to inspect, fine-tune, and deploy the model across multiple environments.
According to NVIDIA, Nemotron 3 Ultra uses a hybrid Mamba-Transformer mixture-of-experts architecture combined with LatentMoE routing, multi-token prediction, and inference-time reasoning controls. The design is intended to improve efficiency during long-running workflows where AI agents repeatedly plan, call tools, evaluate outputs, and coordinate with other systems.
While the model contains 550 billion parameters overall, only a subset is activated during inference. NVIDIA said the 55-billion active parameter design helps reduce compute requirements compared with similarly sized dense models while preserving reasoning performance.
A major focus of the release is support for agentic AI systems. NVIDIA said Nemotron 3 Ultra was trained and optimized for workflows involving orchestration, tool usage, sub-agent coordination, validation, and multi-step task completion rather than traditional single-turn chatbot interactions.
The model also introduces a one-million-token context window, a feature aimed at enterprise use cases involving large code repositories, legal documents, research archives, support histories, and other information-heavy workloads. NVIDIA said the extended context enables AI systems to retain access to substantially more information during long-running tasks.
To support deployment, Nemotron 3 Ultra is compatible with frameworks including vLLM, SGLang, Ollama, llama.cpp, and Hugging Face Transformers. The model is also available through NVIDIA NIM microservices and cloud platforms including Amazon SageMaker JumpStart, Google Cloud, Microsoft Foundry, and Oracle Cloud.
NVIDIA highlighted several architectural and training innovations behind the model, including Multi-Teacher On-Policy Distillation, a training approach that uses multiple specialized teacher models to improve reasoning across different domains. The company also released portions of the training pipeline, including supervised fine-tuning data, reinforcement learning tasks, and training recipes for developers looking to adapt the model to specific workloads.
Alongside Nemotron 3 Ultra, NVIDIA announced two additional models: Nemotron 3.5 Content Safety, a 4-billion-parameter guardrail model designed to identify unsafe or policy-violating content across text and images, and Nemotron 3.5 ASR, a multilingual automatic speech recognition model built for real-time voice-based AI agents.
The company also said future Nemotron releases will adopt the Linux Foundation’s OpenMDW-1.1 license, providing a unified framework covering model weights, architecture, software, and related artifacts.
Nemotron 3 Ultra is available now through NVIDIA’s developer platforms, partner ecosystem, and Hugging Face repositories, with deployment options spanning self-hosted infrastructure, managed cloud environments, and NVIDIA’s own AI services stack.
See more here:
This analysis is based on reporting from Nvidia and Startup Fortune.
Images courtesy of Nvidia.
This article was generated with AI assistance and reviewed for accuracy and quality.
About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.
Word count: 538Reading time: 0 minutes
Explore More AI Resources
Continue with high-value guides related to this topic.