Running AI Workloads on Rack-Scale Supercomputers: From Hardware to Topology-Aware Scheduling

Originally published at: Running AI Workloads on Rack-Scale Supercomputers: From Hardware to Topology-Aware Scheduling | NVIDIA Technical Blog

The NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 systems, featuring NVIDIA Blackwell architecture, are rack-scale supercomputers. They’re designed with 18 tightly coupled compute trays, massive GPU fabrics, and high-bandwidth networking packaged as a unit.   For AI architects and HPC platform operators, the challenge isn’t just racking and stacking hardware—it’s turning infrastructure into safe, performant, and…