How does the Thread Block Cluster of the Nvidia H100 work concurrently?

My expectation is that RTX4090 (and other variants of the Ada Lovelace family) will be compute capability 8.9 devices, and as such are not capable of cluster programming that requires CUDA 9.0. See here.