How does the Thread Block Cluster of the Nvidia H100 work concurrently?

wangw42 · October 5, 2022, 6:39pm

yeah! I noticed this newly released CUDA too! Reading and installing it now.
Thank you so much! XDDD

wangw42 · October 12, 2022, 7:35pm

Hi Robert,

When I try to write some clusters by using cuda11.8, it says "error: namespace “cooperative_groups” has no member “cluster_group”.
And after carefully checking, I noticed that in the CUDA documentation C.4.1.2. Cluster Group part, it says “ The APIs are available on all hardware with Compute Capability 9.0+. ”
So does this mean that I cannot code cluster since I don’t have an H100?
Or is there anything wrong with my installation of CUDA11.8?

Thank you very much!

BTW, my cuda version:

bash-4.2$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Robert_Crovella · October 12, 2022, 7:38pm

Yes, cluster codes require cc9.0. If you compile for a cc9.0 target, I think that error will go away, but you could only run such a code on H100.

wangw42 · October 12, 2022, 7:45pm

Thank you so much for your reply!

wangw42 · October 12, 2022, 8:20pm

So, will the newly released Geforce RTX4090 be possible to support this feature? XD
Thank you!!

Robert_Crovella · October 12, 2022, 9:31pm

My expectation is that RTX4090 (and other variants of the Ada Lovelace family) will be compute capability 8.9 devices, and as such are not capable of cluster programming that requires CUDA 9.0. See here.

wangw42 · October 12, 2022, 10:10pm

I see, thank you!!

system · October 26, 2022, 10:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.