is it possible to synchronize the entire CUDA grid using CUDA Fortran? Documentation is very misleading about this. On the one hand, there is this new grid_global attributes that should allow doing just that. However, I’ve found the following information that clearly contradicts such functionality: “There is currently limited functionality for cooperative groups of size less than or equal to a thread block”.
I’m confused. Syncing threads within a block was possible before cc70. If one can not go beyond that what grid_global kernels are good for anyways?
Could someone clarify this issue for me?