Nvidia Blackwell To Power Next Phase Generative AI

Updated on March 22 2024

Nvidia has been at the forefront of the artificial intelligence revolution, providing the cutting-edge hardware that powers everything from autonomous vehicles to language models. Now, with their latest Blackwell architecture, they aim to power the new era of generative AI capabilities.

Over the past few years, we’ve witnessed the meteoric rise of large language models like OpenAI’s GPT-3, GPT-3.5 and now GPT-4 that can generate remarkably human-like text. But these are just the first ripples of a tidal wave of generative AI models on the horizon.

Researchers are now developing multimodal models that can not only generate text but also synthesize images, audio, video and more. The possibilities are staggering – imagine an AI assistant that can craft marketing materials complete with designs and product visualizations on command.

However, training and running these complex generative models requires unprecedented compute power. This is where Nvidia’s new Blackwell architecture comes into play.

What is the Nvidia Blackwell?

At its core, Blackwell is a powerhouse AI chip designed from the ground up to accelerate the training, fine-tuning and deployment of large language models and other generative AI workloads. It packs a staggering 208 billion transistors manufactured on TSMC’s cutting-edge 4nm process.

But raw transistor count is just the beginning. The true innovation lies in Blackwell’s custom architectures tailored for the unique parallelism demands of these models:

Second-Gen Transformer Engine: This revamped engine introduces new numeric formats like FP4 and FP6 that double AI compute power while maintaining high accuracy. Coupled with optimizations for sparse models, it unlocks massive performance gains for transformer-based architectures.

NVIDIA Generative AI Engine: As models grow larger and more complex, they require specialized acceleration for their many parallel components like embeddings, feed-forward layers, and attention heads. This custom engine optimizes all these elements for maximum throughput.

Scalable NVLink Interconnect: To unite multiple Blackwell GPUs into a unified virtual accelerator, the 5th gen NVLink fabric enables GPUs to communicate at blistering 1.8 TB/sec speeds. For largest models, up to 576 GPUs can be seamlessly linked with 130 TB/sec of total bandwidth.

Also Read: NVIDIA’s GROOT To Power Humanoid Robots

Blackwell Generative AI Performance

The result of these Blackwell innovations? Generative AI performance that blows past previous generations:

According to Nvidia, training GPT-4’s 1.8 trillion parameter model would require 8,000 H100 GPUs and 15 megawatts of power over 90 days. With Blackwell, the same task takes just 2,000 GPUs and 4 megawatts.

For inference, Blackwell delivers up to 30x higher performance for large language models versus the previous H100 chip, while reducing cost and energy usage by up to 25x.

New Scale of AI Datacenter

To take full advantage of Blackwell’s capabilities, Nvidia has redesigned its DGX AI systems. The DGX GC200 combines 36 Grace CPUs with 72 Blackwell GPUs into a single liquid-cooled rack with 1.4 exaflops of AI inference power.

Eight of these racks can be cabled together into a 11.5 exaflops DGX SuperPOD system with 240TB of HBM memory – enough to support models up to 27 trillion parameters. Cloud providers like AWS, Google and Microsoft are already signing up to offer access.

Blackwell B200 Compared to H100

FeatureBlackwell B200Previous Gen (H100)
Transistor CountMuch Higher (208 Billion)Lower (80 Billion)
PerformanceSignificantly Faster (Up to 4x Faster)Slower
AI PerformanceSignificantly Faster (Up to 30x Faster)Slower
Power EfficiencyMore Efficient (Up to 25x More Efficient)Less Efficient
MemoryLarger (192GB HBM3e)Smaller
Memory BandwidthHigher (8 TB/s)Lower
FocusDesigned for AI WorkloadsMore General Purpose
ScalabilityCan be Scaled in Multi-GPU Systems (HGX B200)Limited Scalability
Nvidia Blackwell B200 Comparison with H100

Also Read: Nvidia CEO’s Views on “No Programming” And Future of AI Code


With each new AI breakthrough, the demands on compute power grow exponentially. The latest large language models can reach into the trillions of parameters, straining the limits of what’s achievable with current hardware.

Nvidia’s Blackwell architecture rewrites these limits with purpose-built engines to accelerate the unique workloads of generative AI. From faster transformer math to scaling across thousands of GPUs, it promises to unlock a new wave of artificial intelligence capabilities.

This accelerated performance arrives just as generative AI is going multimodal, with models that can synthesize text, images, video and audio in a unified system. Blackwell’s computational power will prove indispensable in making this multimedia AI revolution a reality.

The AI future is taking shape before our eyes. And if Nvidia has its way, the Blackwell architecture will be the driving force powering this transformative leap.

Featured Tools







Related Articles