MLPerf Training 6.0 introduced DeepSeek-V3 671B and GPT-OSS-20B workloads, reflecting the growing use of mixture-of-experts models. NVIDIA said its platform was the only one submitted across all seven benchmarks and produced the fastest training time in each category.
NVIDIA also reported that GB300 NVL72 delivered up to 1.6 times faster training than GB200 NVL72 at the same scale. The company attributed the improvement to Blackwell Ultra capabilities including higher compute density with NVFP4, expanded memory and a higher power ceiling that allows the GPU to sustain peak performance.
Scale was another focus of the benchmark results. On DeepSeek-V3 671B, NVIDIA said it scaled to 8,192 GPUs using GB200 NVL72 systems, marking the largest Blackwell-based submission in MLPerf Training to date. The company also submitted results using 5,120 GPUs on Llama 3.1 405B with GB200 NVL72 systems.
Partner submissions played a major role in the results. Microsoft Azure scaled Llama 3.1 405B training to 8,192 GPUs with GB200 NVL72 systems and reached the benchmark’s reference quality target in 7.07 minutes, which NVIDIA said was the fastest time to train for that test. CoreWeave reached the quality target for DeepSeek-V3 671B in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet.
NVIDIA framed the benchmark performance around both speed and reliability. The company said production AI training jobs can run for weeks or months across large GPU clusters, making uptime and recovery critical to overall throughput. Its platform includes manufacturing tests, chip-level monitoring, self-healing capabilities and network rerouting features intended to reduce interruptions.
For recovery, NVIDIA pointed to its Resiliency Extension, or NVRx, which is designed to detect faults, monitor cluster health and resume jobs from recent checkpoints when interruptions occur rather than restarting an entire training run.
NVIDIA said 19 ecosystem partners submitted results in this MLPerf round, including Microsoft Azure, CoreWeave, Dell Technologies, Google Cloud, Cisco, Fujitsu, Hewlett Packard Enterprise, Lambda, Nebius, Supermicro and others. The company also highlighted customers using Blackwell infrastructure for demanding AI workloads, including Cohere, Midjourney, Thinking Machines Lab and Higgsfield.
The results reinforce NVIDIA’s pitch that faster training infrastructure can shorten model development cycles, reduce training costs and help AI companies move more quickly from experimentation to deployment.
This analysis is based on reporting from Nvidia.
Image courtesy of Nvidia.
This article was generated with AI assistance and reviewed for accuracy and quality.