Synchronizing warps between SMs

1)Is there a way to synchronize warps between multiprocessors? I would like warp 0 of all the 16 multiprocessors to wait at a barrier like sync_threads. Is this actually possible. I know it is possible to synchronize all the threads between all the SMs which is essentially the grid, can this somehow be made to work for specific warps?
2)Also what is the best way to synchronize all the threads in the grid? Is it using cooperative_groups or consecutive kernel calls or any other mechanism.

Not quite understand the query. would need your help to check the samples in


And see if either sample can be run to show the issue you are facing. To help us more understand the query.

sounds like you are looking for cudaLaunchCooperativeKernel …
Cooperative Groups: Flexible CUDA Thread Programming | NVIDIA Technical Blog

This will help thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.