Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

Originally published at: Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

NVLink 6 allows 72 GPUs to behave as one coherent accelerator inside the rack.

…the execution engine that transforms this coherent, rack-scale foundation into sustained training and inference performance.

I understand from the blog post that the CPU–GPU link is coherent. However, I’m unclear whether coherence is also maintained across GPUs connected over NVLink on the Rubin platform. Could you please clarify this?

Hi Rajesh, great question! We are updating wording in the Inside Rubin Platform tech blog to replace the coherent rack phrasing with more precise language to avoid implying CPU-style hardware coherency while preserving the intended scale-up messaging. To answer your question, this is not hardware cache coherency in the traditional sense of CPU-GPU or CPU-CPU as it does not turn “a rack of GPUs” into one big cache-coherent NUMA machine with MESI-style snooping across GPU caches, but rather synchronization is managed through software. NVLink enables GPU-to-GPU peer access (loads/stores/atomics to peer memory) and high-bandwidth collectives, with software-managed synchronization semantics (CUDA, NCCL, NVSHMEM). Hope this helps.

Thanks,

-Kyle

1 Like