I am writing code for WASP and want to achieve synchronization across multiple blocks within a cluster. In the CUTLASS code, directly calling cluster.sync()
or cute::cluster_sync()
outside of the producer/consumer code block causes the code to hang. I suspect this is because the underlying barrier.cluster.wait()
defaults to waiting only for all threads within a single block. However, I’m writing this within the consumer, which forces it to wait for producer threads that can never reach it.
Therefore, my question is: how can I implement cluster.sync()
in WASP? Additionally, I’ve noticed that mbarrier
supports cluster arrive
but not cluster wait
.