NVIDIA Dominates AI Training Benchmarks with Blackwell Ultra, Pioneering FP4 Precision
The landscape of artificial intelligence is rapidly evolving, demanding unprecedented computational power to train increasingly sophisticated models. In a landmark achievement, NVIDIA has swept all seven tests in MLPerf Training v5.1, the industry’s gold standard for evaluating AI training performance. This resounding victory underscores NVIDIA’s continued leadership in accelerating the development of large language models (LLMs), image generation, recommender systems, computer vision, and graph neural networks. The results demonstrate not only raw speed but also the breadth and maturity of NVIDIA’s hardware and software ecosystem.
Notably, NVIDIA was the sole participant to submit results across all seven MLPerf Training v5.1 benchmarks, a testament to the versatility of NVIDIA GPUs and the robust capabilities of its CUDA software stack. This comprehensive performance showcases NVIDIA’s commitment to providing a complete and reliable platform for AI innovation.
Blackwell Ultra: A New Era of AI Performance
Central to NVIDIA’s success is the debut of the GB300 NVL72 rack-scale system, powered by the groundbreaking NVIDIA Blackwell Ultra GPU architecture. Building on a record-setting performance in the most recent MLPerf Inference round, Blackwell Ultra delivers a substantial leap in AI training capabilities. Compared to the previous-generation Hopper architecture, the GB300 NVL72 achieves over 4x the pretraining performance for Llama 3.1 405B and nearly 5x faster fine-tuning for Llama 2 70B, all while utilizing the same number of GPUs.

This dramatic improvement stems from architectural enhancements within Blackwell Ultra, including new Tensor Cores offering 15 petaflops of NVFP4 AI compute, doubled attention-layer compute, and a massive 279GB of HBM3e memory. These advancements, coupled with innovative training methodologies, unlock the full potential of Blackwell Ultra’s computational power. The NVIDIA Quantum-X800 InfiniBand platform, the industry’s first end-to-end 800 Gb/s networking solution, further amplifies performance by doubling scale-out networking bandwidth.
NVFP4 Precision: A Paradigm Shift in LLM Training
A key innovation driving NVIDIA’s MLPerf Training v5.1 results is the adoption of NVFP4 precision – a first in the history of the benchmark. Lower precision calculations can significantly accelerate compute performance, but require careful consideration to maintain accuracy. NVIDIA’s teams have meticulously optimized every layer of the software stack to effectively leverage FP4 precision for LLM training.
The NVIDIA Blackwell GPU excels at FP4 calculations, including the NVIDIA-designed NVFP4 format and other FP4 variants, performing them at double the rate of FP8. Blackwell Ultra further boosts this capability to 3x, delivering substantially greater AI compute performance. AI reasoning benefits greatly from this increased efficiency.
Scaling to New Heights with Blackwell
NVIDIA shattered the time-to-train record for Llama 3.1 405B, achieving a remarkable 10 minutes with over 5,000 Blackwell GPUs. This represents a 2.7x improvement over the previous Blackwell-based result, achieved through efficient scaling to more than twice the number of GPUs and the utilization of NVFP4 precision. To further illustrate the per-GPU performance gains, NVIDIA demonstrated a time-to-train of 18.79 minutes using 2,560 Blackwell GPUs – a 45% improvement over the previous submission using 2,496 GPUs.

New Benchmarks, Continued Dominance
NVIDIA also established new performance records on the two newly introduced MLPerf Training v5.1 benchmarks: Llama 3.1 8B and FLUX.1. Llama 3.1 8B, a compact yet powerful LLM, replaced the older BERT-large model, adding a modern benchmark to the suite. NVIDIA achieved a training time of 5.2 minutes using up to 512 Blackwell Ultra GPUs. FLUX.1, a state-of-the-art image generation model, replaced Stable Diffusion v2, with NVIDIA being the only platform to submit results on this benchmark, achieving a record time of 12.5 minutes with 1,152 Blackwell GPUs.
NVIDIA continues to hold the top position on existing benchmarks for graph neural networks, object detection, and recommender systems. What does this level of consistent performance mean for the future of AI development? And how will these advancements impact industries reliant on large-scale AI models?
A Thriving Ecosystem Fuels Innovation
NVIDIA’s success is amplified by a robust ecosystem of partners, with compelling submissions from 15 organizations including ASUSTeK, Dell Technologies, Giga Computing, Hewlett Packard Enterprise, Krai, Lambda, Lenovo, Nebius, Quanta Cloud Technology, Supermicro, University of Florida, Verda (formerly DataCrunch), and Wiwynn. This collaborative spirit drives continuous innovation and accelerates the adoption of AI technologies.
NVIDIA’s commitment to a one-year innovation cycle is delivering significant and rapid performance increases across the entire AI lifecycle – from pretraining to inference – paving the way for new levels of intelligence and accelerating AI adoption worldwide. Large language models are at the forefront of this revolution.
Learn more about NVIDIA’s performance data on the Data Center Deep Learning Product Performance Hub and Performance Explorer pages.
Frequently Asked Questions About NVIDIA’s MLPerf Training v5.1 Results
What is MLPerf Training and why is it important?
MLPerf Training is a widely recognized benchmark suite for measuring the performance of AI training systems. It provides a standardized and objective way to compare different hardware and software platforms, driving innovation and transparency in the AI industry.
How does NVIDIA’s Blackwell Ultra architecture improve AI training performance?
Blackwell Ultra incorporates several key architectural improvements, including new Tensor Cores with enhanced NVFP4 compute, doubled attention-layer compute, and increased HBM3e memory capacity. These advancements, combined with optimized training methods, deliver significant performance gains.
What is NVFP4 precision and how does it accelerate LLM training?
NVFP4 is a lower-precision data format that allows for faster computations. NVIDIA has innovated to utilize NVFP4 precision in LLM training while maintaining accuracy, resulting in substantial performance improvements.
What role does the NVIDIA Quantum-X800 InfiniBand platform play in these results?
The NVIDIA Quantum-X800 InfiniBand platform provides a high-bandwidth, low-latency networking solution that enables efficient scaling of AI training workloads across multiple systems, doubling scale-out networking bandwidth compared to previous generations.
How does NVIDIA’s ecosystem contribute to its success in AI training?
NVIDIA’s extensive ecosystem of partners, including hardware manufacturers and software developers, fosters collaboration and innovation, leading to optimized AI solutions and broader adoption of NVIDIA technologies.
The Future of AI Training: Trends and Challenges
The advancements showcased in MLPerf Training v5.1 represent a significant step forward in AI capabilities. However, the pursuit of ever-more-powerful AI models presents ongoing challenges. These include the increasing demand for energy-efficient hardware, the need for more sophisticated algorithms to manage model complexity, and the importance of addressing ethical considerations related to AI development and deployment. Sustainable AI practices are becoming increasingly critical.
Looking ahead, we can expect to see continued innovation in areas such as sparse activation, quantization, and distributed training techniques. The development of specialized AI accelerators, like NVIDIA’s Blackwell Ultra, will be crucial for unlocking the full potential of future AI models. Furthermore, the integration of AI with other emerging technologies, such as quantum computing, could lead to even more transformative breakthroughs.
Share this article with your network to spark a conversation about the future of AI! What implications do these advancements hold for your industry? Let us know your thoughts in the comments below.
Discover more from Archyworldys
Subscribe to get the latest posts sent to your email.