NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference

Originally published at: https://developer.nvidia.com/blog/nvidia-gb200-nvl72-delivers-trillion-parameter-llm-training-and-real-time-inference/

What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: Natural language processing tasks like translation, question answering, abstraction, and fluency. Holding longer-term context and conversational ability. Multimodal applications combining language, vision, and speech. Creative applications like…

Can you clarify if the BW is 900GB/s or 1.8TB/s (see the quote below)? Is liquid immersion required for 200Gbps serdes, 1.8TB/s NVLink Gen5 to function? Thanks.
“The heart of the GB200 NVL72 is the NVIDIA GB200 Grace Blackwell Superchip. It connects two high-performance NVIDIA Blackwell Tensor Core GPUs and the NVIDIA Grace CPU with the NVLink-Chip-to-Chip (C2C) interface that delivers 900 GB/s of bidirectional bandwidth.”

I believe GPU-GPU NVLink BW is 1.8TB/s but as the Grace CPU is being reused from GH100, CPU-GPU NVLink BW is still 900GB/s.

B100 and B200 seem to have 180GB GPU memory as per this article: NVIDIA HGX AI Supercomputing Platform
but GB200 with two GPUs seems to have 394GB (192GB per GPU) of GPU memory. Wondering if there is a typo somewhere.

Hi Kiranpalli,

Thanks for the clarification. Are the NVLink 5th gen 1.8TB Cu cables the same as those of NVLink gen 4 (0.9TB)? Thanks.

Best,
Peter

pldchange, NVLink-C2C is a 900GB/s bidirectional interconnect between CPU and GPU, used for Grace and Hopper or Blackwell GPU(s). In the case of GB200, it is shared between two Blackwell GPUs and Grace. 5th generationNVLink is a1.8TB/s interconnect between GPUs. Liquid cooling is used to dissipate the heat energy from the Superchip but not related to the SerDes to function. The 5th generation NVLink cables are spec’d for the fast 1.8TB/s data rate.

“PCIe gen 6 support for high-speed networking” Can you clarify if this is Gen 5 or Gen 6?

Congradulation! That’s indeed an impressive improvement.
May I know more about the inference environment? like the parallism mode in the inference, the batch size and so on.

PCIe Gen 6 is supported.