Hello
Using cudaLaunchCooperativeKernel, should this_grid.sync() cause common global memory address calls across blocks within the grid to be consolidated / shared (into a reduced number of global memory accesses)?
Hello
Using cudaLaunchCooperativeKernel, should this_grid.sync() cause common global memory address calls across blocks within the grid to be consolidated / shared (into a reduced number of global memory accesses)?
no