How to Implement cluster.sync() Across Multiple Blocks in WASP?

I am writing code for WASP and want to achieve synchronization across multiple blocks within a cluster. In the CUTLASS code, directly calling cluster.sync() or cute::cluster_sync() outside of the producer/consumer code block causes the code to hang. I suspect this is because the underlying barrier.cluster.wait() defaults to waiting only for all threads within a single block. However, I’m writing this within the consumer, which forces it to wait for producer threads that can never reach it.

Therefore, my question is: how can I implement cluster.sync() in WASP? Additionally, I’ve noticed that mbarrier supports cluster arrive but not cluster wait.