Question about GH200 memory size

Hi
I have a question about the latest GH200 platform. In the system specification with 256 superchips, the GPU memory is told to be 144TB. But in the superchip platform, the Hopper GPU memory is 96GB.
Then 96GB*256=24TB which is far less than 144TB. I was wondering what is missing here.

Hi @mahmood.nt ,

With the DGX GH200, the GPUs have access to host (aka CPU) memory, not just other GPUs. NVIDIA DGX GH200 Technical White Paper has more info on this, but it effectively lets all GPUs in the system access all memory, regardless of location, with similar semantics and speed.

Hence, 144TB when you add GPU + Host memory together.

ScottE

Thanks for the reply. I also have one more question. I didn’t see any wattage value for the power plant for the GH200 system. I know it depends on the workload utilization, but I couldn’t find a number for that. I was able to manually find the TDP of the components and multiply/add the values. I reached 328KW. Is that roughly correct?

The DGX GH200 is made up of smaller building blocks holding 8 GH200 sleds, 3 switch trays, and some additional connectivity elements. Each of those is ~15kW, and multiple of them are connected to create a larger (e.g., 256-node) cluster.

If you’re looking for “what is a single GH200 system power requirement”, you’ll want to talk to your favorite OEM about their particular systems, as NVIDIA is only doing DGX GH200 (aka, larger, NVLink connected clusters) - OEMs will build their own systems which may have different power requirements than what I put above.

ScottE