NVIDIA Blackwell: Born for Extreme-Scale AI Inference





NVIDIA Blackwell: Born for an AI inference on an extreme scale | Nvidia blog






















Nvidia Blackwell scaling capacities have prepared the field to develop the largest AI factories in the world.

Nvidia Blackwell architecture is the reigning leader in the AI ​​revolution.

Many consider Blackwell as a chip, but it may be preferable to consider it as a platform powering a large-scale AI infrastructure.

Growing demand and complexity of the model

Blackwell is the heart of an entire system architecture designed specifically to supply AI factories that produce intelligence using the largest and most complex AI models.

Today’s frontier AI models have hundreds of billions of parameters and would serve nearly a billion users per week. The next generation of models should have much more than one parameter billion – and are formed on tens of billions of data tokens from text, image and video data.

The scaling of a data center – operating up to thousands of computers to share the work – is necessary to respond to this request. But much greater energy performance and efficiency can come from the first scaling: by making a larger computer.

Blackwell redefines the limits of how we can go.

Exponential growth of parameters in notable AI models over time.

Data source: Epoch (2025), with major processing by our world in data

The most difficult form of computer today

AI factories are the machines for the next industrial revolution. Their work is the inference of AI – the most difficult form of calculation known today – and their product is intelligence.

These factories require an infrastructure that can adapt, evolve and maximize each bit of available calculation resources.

What is it like?

A symphony of calculation, networking, storage, power and cooling – with integration into the levels of silicon and systems, racks from top to bottom – orchestrated by software that sees tens of thousands of Blackwell GPUs as one.

The new unit of the data center is NVIDIA GB200 NVL72A rack -scale system that acts as a single massive GPU.

The CEO of Nvidia, Jensen Huang, shows the NVIDIA GB200 NVL72 system and the NVIDIA Grace Blackwell Superchip during his speech at CES 2025.

Birth of a Superchip

At the heart, the Nvidia Grace Blackwell Superchip unit two Blackwell Gpu with a Nvidia Grace CPU.

Merge them into a unified calculation module – a superchip – increases performance by an order of magnitude. To do this, you need new high -speed interconnection technology introduced with the Nvidia Hopper architecture: NVIDIA NVLink flea chip.

This technology unlocks a transparent communication between the CPU and the GPUs, allowing them to share the memory directly, which leads to lower latency and a higher speed for the workloads of the AI.

It takes a symphony of creation, cutting, assembly and inspection to build a superchip.

A new interconnection for the superchip era

The scaling of these performances on several superchips without a bottleneck was impossible with previous networking technology. NVIDIA has therefore created a new type of interconnection to keep the bottlenecks of the performance of emergence and activate large scale AI.

A backbone that eliminates the bottlenecks

The NVIDIA switch NVLink anchors the spine GB200 NVL72 with a copper network of more than 5,000 high performance copper cables, connecting 72 GPU on 18 calculation trays to move data to 130 TB / s.

It is fast enough to transfer all of the Internet cutting -edge traffic in less than a second.

The two miles of copper wire are precisely cut, measured, assembled and tested to create the faster spine Nvidia Nvidia Nvidia.

The spine cartridge is inspected before installation.

The vertebral column, fed, can move a data value of an entire internet in less than a second.

Build a giant GPU for inference

The integration of all these advanced hardware and software, calculation and networking allow GB 200 NVL72 systems to unlock new possibilities for large -scale AI.

Each rack weighs a ton and a half – with more than 600,000 pieces, two miles of thread and millions of converged lines of code.

It acts as a giant virtual GPU, making AI inference at factory scale possible, where each nanosecond and Watt counts.

GB200 NVL72 everywhere

NVIDIA then deconstructs GB200 NVL72 so that partners and customers can configure and create their own NVL72 systems.

Each NVL72 system is a two -ton supercomputer and 1.2 million parts. NVL72 systems are manufactured in more than 150 factories worldwide with 200 technological partners.

From cloud manufacturers to systems, partners from around the world produce Nvidia Blackwell NVL72 systems.

Time to evolve

Tens of thousands of Blackwell NVL72 systems converge to create AI factories.

Working together is not enough. They must work as one.

Nvidia Spectrum-X Ethernet And Quantum-X800 Infiniband Switzers make this effort unified possible at the level of the data center.

Each GPU of a NVL72 system is directly connected to the factory data network and to all the other GPUs in the system. GB200 NVL72 systems offer 400 GBPS of Ethernet or Infiniband interconnection using NVIDIA Connectx-7 NICS.

Nvidia Quantum-X800 Switch, NVLink Switch and Spectrum-X Ethernet Unify one or more NVL72 systems to function as one.

Communication line opening

The scaling of AI factories requires many tools, each at the service of one thing: parallel communication without restriction for each AI workload in the factory.

NVIDIA Bluefield-3 DPUS Make their share to increase the performance of the AI ​​by unloading and accelerating the NO AI tasks that keep the factory in progress: the symphony of networking, storage and safety.

NVIDIA GB200 NVL72 feeds an AI factory in Coreweave, an NVIDIA cloud partner.

The AI ​​factory operating system

The data center is now the computer. NVIDIA DYNAMO is its operating system.

Dynamo orchestra and coordinates AI inference requests in a large GPU fleet to ensure that AI factories operate at the lowest cost possible to maximize productivity and income.

It can add, delete and move GPUs through workloads in response to overvoltages in the use of customers and browse requests to GPUs best suited to work.

Colossus, AI Supercalculator of Xai. Created in 122 days, it houses more than 200,000 NVIDIA GPU – an example of complete and scale architecture.

Blackwell is more than a chip. It is the engine of AI factories.

The largest IT clusters in the world are under construction on Blackwell and Blackwell Ultra architectures – with around 1,000 Racks of Nvidia GB300 systems produced each week.

Related News

AI manufacturers: NVIDIA partners in the United Kingdom have progressed physical and agentic AI, robotics, life sciences and more

The United Kingdom stimulates investments in sovereign AI, using technology to advance industries such as manufacturing, life sciences and more. During the visit of the founder and CEO of Nvidia, the visit of Jensen Huang in the article … Read

NVIDIA Blackwell Ultra defines the bar in the new MLPERF inference reference

Inference performance is essential because it directly influences the economy of an AI factory. The higher the IA factory infrastructure flow, the more it can produce tokens to a … Read the article

Now live: the first exascale supercomputer from Europe, Jupiter, accelerates climate research, neuroscience, quantum simulation

The Jupiter of the Jülich SuperComputing Center – the first exascal supercomputer from Europe – is officially live …. Read the article

AI ON: 6 ways whose team agents increase the performance of the team – and how to measure it

AI agents should be involved in most commercial tasks within three years, with an effective collaboration of human agents which should increase human engagement in high value of 65% …. Read the article

Leave a Reply

Your email address will not be published. Required fields are marked *