NVIDIA’s New Nemotron 3 Ultra Targets Enterprise AI Agents and Long-Context Tasks

June 5, 2026

NVIDIA has released Nemotron 3 Ultra, a new open-weight AI model built for long-running agent workflows, complex reasoning tasks, and large-scale enterprise deployments. The model features a 550-billion-parameter mixture-of-experts architecture with 55 billion active parameters and supports context windows of up to one million tokens.

Available through Hugging Face and NVIDIA’s developer ecosystem, Nemotron 3 Ultra is the largest and most capable model in the Nemotron 3 family. NVIDIA said the model was designed to help AI agents complete multi-step tasks more efficiently while reducing inference costs associated with extended reasoning and orchestration workloads.

The release reflects NVIDIA’s continued expansion beyond AI infrastructure and into the model layer. In addition to publishing model weights, the company has released related datasets, training resources, and deployment tools, allowing developers to inspect, fine-tune, and deploy the model across multiple environments.

According to NVIDIA, Nemotron 3 Ultra uses a hybrid Mamba-Transformer mixture-of-experts architecture combined with LatentMoE routing, multi-token prediction, and inference-time reasoning controls. The design is intended to improve efficiency during long-running workflows where AI agents repeatedly plan, call tools, evaluate outputs, and coordinate with other systems.

While the model contains 550 billion parameters overall, only a subset is activated during inference. NVIDIA said the 55-billion active parameter design helps reduce compute requirements compared with similarly sized dense models while preserving reasoning performance.

A major focus of the release is support for agentic AI systems. NVIDIA said Nemotron 3 Ultra was trained and optimized for workflows involving orchestration, tool usage, sub-agent coordination, validation, and multi-step task completion rather than traditional single-turn chatbot interactions.

The model also introduces a one-million-token context window, a feature aimed at enterprise use cases involving large code repositories, legal documents, research archives, support histories, and other information-heavy workloads. NVIDIA said the extended context enables AI systems to retain access to substantially more information during long-running tasks.

To support deployment, Nemotron 3 Ultra is compatible with frameworks including vLLM, SGLang, Ollama, llama.cpp, and Hugging Face Transformers. The model is also available through NVIDIA NIM microservices and cloud platforms including Amazon SageMaker JumpStart, Google Cloud, Microsoft Foundry, and Oracle Cloud.

NVIDIA highlighted several architectural and training innovations behind the model, including Multi-Teacher On-Policy Distillation, a training approach that uses multiple specialized teacher models to improve reasoning across different domains. The company also released portions of the training pipeline, including supervised fine-tuning data, reinforcement learning tasks, and training recipes for developers looking to adapt the model to specific workloads.

Alongside Nemotron 3 Ultra, NVIDIA announced two additional models: Nemotron 3.5 Content Safety, a 4-billion-parameter guardrail model designed to identify unsafe or policy-violating content across text and images, and Nemotron 3.5 ASR, a multilingual automatic speech recognition model built for real-time voice-based AI agents.

The company also said future Nemotron releases will adopt the Linux Foundation’s OpenMDW-1.1 license, providing a unified framework covering model weights, architecture, software, and related artifacts.

Nemotron 3 Ultra is available now through NVIDIA’s developer platforms, partner ecosystem, and Hugging Face repositories, with deployment options spanning self-hosted infrastructure, managed cloud environments, and NVIDIA’s own AI services stack.

See more here:

This analysis is based on reporting from Nvidia and Startup Fortune.

Images courtesy of Nvidia.

This article was generated with AI assistance and reviewed for accuracy and quality.

Last updated: June 5, 2026

Report Error

About this article: This article was generated with AI assistance and reviewed by our editorial team to ensure it follows our editorial standards for accuracy and independence. We maintain strict fact-checking protocols and cite all sources.

Word count: 538Reading time: 0 minutes

Explore More AI Resources

Continue with high-value guides related to this topic.

Compare AI Models

See ChatGPT, Claude, and Gemini side-by-side in one place.

Best AI Newsletters

Find top AI newsletters and subscribe to ChatAI Daily.

AI FAQ

Quick answers about ChatAI, AI tools, and multi-model chat.

AI Tools

Use free AI tools for summarization, translation, and more.

📧 Stay Updated

Get the latest AI news delivered to your inbox every morning.

Continue Reading

Meta’s Iris AI Chip Enters Production in September, Challenging Nvidia’s Dominance

Meta Platforms plans to begin manufacturing its in-house artificial intelligence chip, code-named Iris, from September as it works toward expanding its computing capacity to 14 gigawatts next year,...

July 9, 2026•5 min read

NVIDIA and LangChain Say Nemotron 3 Ultra Rivals Closed AI Models at One-Tenth the Cost

NVIDIA announced that its open Nemotron 3 Ultra model, when paired with a tuned version of LangChain’s Deep Agents framework, achieved business-task performance comparable to the highest-scoring...

July 8, 2026•5 min read

DeepSeek Reportedly Developing Its Own AI Inference Chips

DeepSeek is developing its own data center inference chips, marking the Chinese AI startup’s planned expansion into semiconductor design as it looks to reduce its dependence on external hardware...

July 7, 2026•5 min read

Explore All Articles

NVIDIA’s New Nemotron 3 Ultra Targets Enterprise AI Agents and Long-Context Tasks

Explore More AI Resources

Compare AI Models

Best AI Newsletters

AI FAQ

AI Tools

AI Tools for this Article

📧 Stay Updated

Related Articles

Meta’s Iris AI Chip Enters Production in September, Challenging Nvidia’s Dominance

NVIDIA and LangChain Say Nemotron 3 Ultra Rivals Closed AI Models at One-Tenth the Cost

DeepSeek Reportedly Developing Its Own AI Inference Chips

Continue Reading

Meta’s Iris AI Chip Enters Production in September, Challenging Nvidia’s Dominance

NVIDIA and LangChain Say Nemotron 3 Ultra Rivals Closed AI Models at One-Tenth the Cost

DeepSeek Reportedly Developing Its Own AI Inference Chips

Stay Ahead of AI

Go Premium

Follow Our Community

ChatAI

Go Premium

ChatAI

Follow Our Community