Thread block clustering in Blackwell GPUs

samkawtikwar · January 17, 2025, 4:57pm

I would like to know if the newer RTX 50xx GPUs would have hardware for faster synchronization across thread blocks on the same SM cluster (the feature released with hopper GPUs). I assume it would be yes as Blackwell supersedes Hopper but we haven’t had any RTX GPUs with that capability yet (to the best of my knowledge).

Thanks

Robert_Crovella · January 17, 2025, 5:01pm

compute capabilities beyond 9.0 are not documented yet. I would expect a CUDA update in the future that pertains to new GPUs.

Curefab · January 17, 2025, 6:45pm

A good hint about Nvidia’s recent plans is the difference between sm_90 and sm_90a. With the sm_90a features being a Hopper one off or at least only meant for datacenter GPUs and sm_90 as general features for newer generations.

Also see

Cluster size of 8 is forward compatible starting compute capability 9.0

and

The maximum portable cluster size supported is 8; however, NVIDIA Hopper H100 GPU allows for a nonportable cluster size of 16 by opting in.

But it could be that either the RTX 5000 series gets a compute capability below 9.0, or that the sections about forward compatibility are changed to 10.0 only and RTX 5000 is 10.5.

Topic		Replies	Views
Block Clustering in Ada architecture CUDA Programming and Performance	3	46	September 4, 2024
Hopper architecture Thread block reconfiguration CUDA Programming and Performance	2	47	July 18, 2024
How does the Thread Block Cluster of the Nvidia H100 work concurrently? CUDA Programming and Performance gpu	27	5612	October 26, 2022
Why is the amount of thread blocks per cluster and the dynamic shared memory that I can allocate much lower than expected? CUDA Programming and Performance	8	75	December 15, 2024
Does 4080 support Distributed Shared Memory？ CUDA Programming and Performance	6	545	December 26, 2022
Hopper thread block reconfiguration CUDA Programming and Performance	2	52	July 18, 2024
Why is shared memory configuration size is limiting the occupancy CUDA Programming and Performance kernel , profiling	2	899	June 4, 2023
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	5711	December 29, 2022
Max threads/blocks CUDA Programming and Performance	10	71	September 6, 2024
Confusion about setting kernel block and grid size for maximum occupancy CUDA Programming and Performance cuda	11	625	March 30, 2024

Thread block clustering in Blackwell GPUs

Related topics