NVIDIA Blackwell: Born for Extreme-Scale AI Inference

NVIDIA Blackwell: Born for an AI inference on an extreme scale | Nvidia blog

Nvidia Blackwell scaling capacities have prepared the field to develop the largest AI factories in the world.

Nvidia Blackwell architecture is the reigning leader in the AI revolution.

Many consider Blackwell as a chip, but it may be preferable to consider it as a platform powering a large-scale AI infrastructure.

Growing demand and complexity of the model

Blackwell is the heart of an entire system architecture designed specifically to supply AI factories that produce intelligence using the largest and most complex AI models.

Today’s frontier AI models have hundreds of billions of parameters and would serve nearly a billion users per week. The next generation of models should have much more than one parameter billion – and are formed on tens of billions of data tokens from text, image and video data.

The scaling of a data center – operating up to thousands of computers to share the work – is necessary to respond to this request. But much greater energy performance and efficiency can come from the first scaling: by making a larger computer.

Blackwell redefines the limits of how we can go.

Exponential growth of parameters in notable AI models over time.

Data source: Epoch (2025), with major processing by our world in data

The most difficult form of computer today

AI factories are the machines for the next industrial revolution. Their work is the inference of AI – the most difficult form of calculation known today – and their product is intelligence.

These factories require an infrastructure that can adapt, evolve and maximize each bit of available calculation resources.

What is it like?

A symphony of calculation, networking, storage, power and cooling – with integration into the levels of silicon and systems, racks from top to bottom – orchestrated by software that sees tens of thousands of Blackwell GPUs as one.

The new unit of the data center is NVIDIA GB200 NVL72A rack -scale system that acts as a single massive GPU.

The CEO of Nvidia, Jensen Huang, shows the NVIDIA GB200 NVL72 system and the NVIDIA Grace Blackwell Superchip during his speech at CES 2025.

Birth of a Superchip

At the heart, the Nvidia Grace Blackwell Superchip unit two Blackwell Gpu with a Nvidia Grace CPU.

Merge them into a unified calculation module – a superchip – increases performance by an order of magnitude. To do this, you need new high -speed interconnection technology introduced with the Nvidia Hopper architecture: NVIDIA NVLink flea chip.

This technology unlocks a transparent communication between the CPU and the GPUs, allowing them to share the memory directly, which leads to lower latency and a higher speed for the workloads of the AI.

It takes a symphony of creation, cutting, assembly and inspection to build a superchip.

A new interconnection for the superchip era

The scaling of these performances on several superchips without a bottleneck was impossible with previous networking technology. NVIDIA has therefore created a new type of interconnection to keep the bottlenecks of the performance of emergence and activate large scale AI.

A backbone that eliminates the bottlenecks

The NVIDIA switch NVLink anchors the spine GB200 NVL72 with a copper network of more than 5,000 high performance copper cables, connecting 72 GPU on 18 calculation trays to move data to 130 TB / s.

It is fast enough to transfer all of the Internet cutting -edge traffic in less than a second.

The two miles of copper wire are precisely cut, measured, assembled and tested to create the faster spine Nvidia Nvidia Nvidia.

The spine cartridge is inspected before installation.

The vertebral column, fed, can move a data value of an entire internet in less than a second.

Build a giant GPU for inference

The integration of all these advanced hardware and software, calculation and networking allow GB 200 NVL72 systems to unlock new possibilities for large -scale AI.

Each rack weighs a ton and a half – with more than 600,000 pieces, two miles of thread and millions of converged lines of code.

It acts as a giant virtual GPU, making AI inference at factory scale possible, where each nanosecond and Watt counts.

GB200 NVL72 everywhere

NVIDIA then deconstructs GB200 NVL72 so that partners and customers can configure and create their own NVL72 systems.

Each NVL72 system is a two -ton supercomputer and 1.2 million parts. NVL72 systems are manufactured in more than 150 factories worldwide with 200 technological partners.

From cloud manufacturers to systems, partners from around the world produce Nvidia Blackwell NVL72 systems.

Time to evolve

Tens of thousands of Blackwell NVL72 systems converge to create AI factories.

Working together is not enough. They must work as one.

Nvidia Spectrum-X Ethernet And Quantum-X800 Infiniband Switzers make this effort unified possible at the level of the data center.

Each GPU of a NVL72 system is directly connected to the factory data network and to all the other GPUs in the system. GB200 NVL72 systems offer 400 GBPS of Ethernet or Infiniband interconnection using NVIDIA Connectx-7 NICS.

Nvidia Quantum-X800 Switch, NVLink Switch and Spectrum-X Ethernet Unify one or more NVL72 systems to function as one.

Communication line opening

The scaling of AI factories requires many tools, each at the service of one thing: parallel communication without restriction for each AI workload in the factory.

NVIDIA Bluefield-3 DPUS Make their share to increase the performance of the AI by unloading and accelerating the NO AI tasks that keep the factory in progress: the symphony of networking, storage and safety.

NVIDIA GB200 NVL72 feeds an AI factory in Coreweave, an NVIDIA cloud partner.

The AI factory operating system

The data center is now the computer. NVIDIA DYNAMO is its operating system.

Dynamo orchestra and coordinates AI inference requests in a large GPU fleet to ensure that AI factories operate at the lowest cost possible to maximize productivity and income.

It can add, delete and move GPUs through workloads in response to overvoltages in the use of customers and browse requests to GPUs best suited to work.

Colossus, AI Supercalculator of Xai. Created in 122 days, it houses more than 200,000 NVIDIA GPU – an example of complete and scale architecture.

Blackwell is more than a chip. It is the engine of AI factories.

The largest IT clusters in the world are under construction on Blackwell and Blackwell Ultra architectures – with around 1,000 Racks of Nvidia GB300 systems produced each week.

Android users are urged to immediately delete hundreds of apps amid cyberattack warnings.

Apple just fixed this annoying lock screen feature with iOS 26.1

Bitcoin sinks below $100,000; Ethereum,

MSU study dives deeper into how well AI can detect human deception

OpenAI’s Sora app is now available on Android

iOS 26.2 Beta brings improved security alerts to iPhone

NVIDIA Blackwell: Born for Extreme-Scale AI Inference

Growing demand and complexity of the model

The most difficult form of computer today

Birth of a Superchip

A new interconnection for the superchip era

A backbone that eliminates the bottlenecks

Build a giant GPU for inference

GB200 NVL72 everywhere

Time to evolve

Communication line opening

The AI factory operating system

Leave a Reply Cancel reply

Growing demand and complexity of the model

The most difficult form of computer today

Birth of a Superchip

A new interconnection for the superchip era

A backbone that eliminates the bottlenecks

Build a giant GPU for inference

GB200 NVL72 everywhere

Time to evolve

Communication line opening

The AI ​​factory operating system

Leave a Reply Cancel reply

Related News

The AI factory operating system