I would like to achieve synchronisation among the active thread blocks scheduled on a GPU.
Is this possible to do with the current co-operative thread grouping and grid synchronization concept ?
My requirement is that a current scheduled thread blocks co-operatively load a memory segment into shared memory and then compute and then synchronize until both are complete…