How does the Thread Block Cluster of the Nvidia H100 work concurrently?

yeah! I noticed this newly released CUDA too! Reading and installing it now.
Thank you so much! XDDD

Hi Robert,

When I try to write some clusters by using cuda11.8, it says "error: namespace “cooperative_groups” has no member “cluster_group”.
And after carefully checking, I noticed that in the CUDA documentation C.4.1.2. Cluster Group part, it says “ The APIs are available on all hardware with Compute Capability 9.0+. ”
So does this mean that I cannot code cluster since I don’t have an H100?
Or is there anything wrong with my installation of CUDA11.8?

Thank you very much!

BTW, my cuda version:

bash-4.2$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Yes, cluster codes require cc9.0. If you compile for a cc9.0 target, I think that error will go away, but you could only run such a code on H100.

Thank you so much for your reply!

So, will the newly released Geforce RTX4090 be possible to support this feature? XD
Thank you!!

My expectation is that RTX4090 (and other variants of the Ada Lovelace family) will be compute capability 8.9 devices, and as such are not capable of cluster programming that requires CUDA 9.0. See here.

I see, thank you!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.