NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference

jwitsoe · March 18, 2024, 9:59pm

Originally published at: https://developer.nvidia.com/blog/nvidia-gb200-nvl72-delivers-trillion-parameter-llm-training-and-real-time-inference/

What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: Natural language processing tasks like translation, question answering, abstraction, and fluency. Holding longer-term context and conversational ability. Multimodal applications combining language, vision, and speech. Creative applications like…

pldchang · March 20, 2024, 11:31pm

Can you clarify if the BW is 900GB/s or 1.8TB/s (see the quote below)? Is liquid immersion required for 200Gbps serdes, 1.8TB/s NVLink Gen5 to function? Thanks.
“The heart of the GB200 NVL72 is the NVIDIA GB200 Grace Blackwell Superchip. It connects two high-performance NVIDIA Blackwell Tensor Core GPUs and the NVIDIA Grace CPU with the NVLink-Chip-to-Chip (C2C) interface that delivers 900 GB/s of bidirectional bandwidth.”

kiranpalli · March 21, 2024, 12:57am

I believe GPU-GPU NVLink BW is 1.8TB/s but as the Grace CPU is being reused from GH100, CPU-GPU NVLink BW is still 900GB/s.

kiranpalli · March 21, 2024, 1:11am

B100 and B200 seem to have 180GB GPU memory as per this article: NVIDIA HGX AI Supercomputing Platform
but GB200 with two GPUs seems to have 394GB (192GB per GPU) of GPU memory. Wondering if there is a typo somewhere.

pldchang · March 21, 2024, 11:02am

Hi Kiranpalli,

Thanks for the clarification. Are the NVLink 5th gen 1.8TB Cu cables the same as those of NVLink gen 4 (0.9TB)? Thanks.

Best,
Peter

hpetty · March 21, 2024, 8:50pm

pldchange, NVLink-C2C is a 900GB/s bidirectional interconnect between CPU and GPU, used for Grace and Hopper or Blackwell GPU(s). In the case of GB200, it is shared between two Blackwell GPUs and Grace. 5th generationNVLink is a1.8TB/s interconnect between GPUs. Liquid cooling is used to dissipate the heat energy from the Superchip but not related to the SerDes to function. The 5th generation NVLink cables are spec’d for the fast 1.8TB/s data rate.

xin.liu2 · March 27, 2024, 7:27am

“PCIe gen 6 support for high-speed networking” Can you clarify if this is Gen 5 or Gen 6?

SimonRen · March 28, 2024, 5:43am

Congradulation! That’s indeed an impressive improvement.
May I know more about the inference environment? like the parallism mode in the inference, the batch size and so on.

hpetty · April 1, 2024, 4:09pm

PCIe Gen 6 is supported.

rangchen.yu · April 13, 2024, 9:06pm

For “The NVIDIA GB200 NVL72 introduces fifth-generation NVLink, which connects up to 576 GPUs in a single NVLink domain with over 1 PB/s total bandwidth and 240 TB of fast memory.”: Is this done with L2/L3 NVlink switches? The one-Rack NVL72 seems to saturates all 9 NVlink switch ports with 72 GPUs. FOr 576 GPU superPod, per rack GPU need to reduced to 36 to leave half of the NVlink Switch ports for uplinks connection to L2/L3 fabrics to scale up to 576 GPUs?

hpetty · April 17, 2024, 6:05pm

Welcome rangchen.yu, there are multiple network topology options for NVL576 which we review with customers to match their workloads and all-to-all performance needs. One topology option may use L2 switches where a number of ports are used as uplinks so compute trays fully connect every GPU to NVLink. Closer to shipping we will publish a reference design document detailing NVL576. In the interim, if you are an interested customer, please work with your NVIDIA account team to match options to your unique needs.

rangchen.yu · April 17, 2024, 6:21pm

Thanks for the reply. Looks like the GPU to L1 switch tray will need to be 72:18 instead of 72:9 in NVL72. Do you do this by reduce GPU per rack to 36, leave 9 switch tray, or GPU per rack stay at 72, but need to add 9 more NVlink Swiches to make uplink ports available? Will space, power available for this full 72 GPUs with 18 switches?

highlander.email.server · April 17, 2024, 7:28pm

Will the fully NVLink connected NVL576 be based on NVL72 or NVL36?

BlueLotus · September 27, 2024, 8:44pm

Can you provide a link to the diagnostic tooling documentation for GB200 NVL72

Topic		Replies	Views
수조 개의 파라미터 LLM 트레이닝 및 실시간 추론을 제공하는 NVIDIA GB200 NVL721 Technical Blog - South Korea	1	255	April 3, 2024
Upgrading Multi-GPU Interconnectivity with the Third-Generation NVIDIA NVSwitch Technical Blog	2	728	April 9, 2024
LLM, 추천 시스템 및 GNN을 위한 하나의 거대한 슈퍼칩: NVIDIA GH200 NVL32 Technical Blog - South Korea	0	563	November 30, 2023
One Giant Superchip for LLMs, Recommenders, and GNNs: Introducing NVIDIA GH200 NVL32 Technical Blog	0	544	November 28, 2023
NVIDIA Contributes NVIDIA GB200 NVL72 Designs to Open Compute Project Technical Blog	3	197	January 25, 2025
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance Technical Blog	1	39	September 27, 2024
NVIDIA Blackwell Delivers up to 2.6x Higher Performance in MLPerf Training v5.0 Technical Blog	1	21	June 4, 2025
How NVLink Will Enable Faster, Easier Multi-GPU Computing Technical Blog	10	776	June 15, 2016
Demystifying AI Inference Deployments for Trillion Parameter Large Language Models Technical Blog	3	201	April 17, 2025
NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference Technical Blog	1	72	August 12, 2024

NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference

Related topics