Synchronize just first N threads of a block ?


is it possible to synchronize just the first N threads of a block ??

For example, if i just need to copy 5 values into a scratchpad, it would be nice if
just the first 5 threads are executed for the first copy an all others should be discarded.

__shared__ float ScratchPad[5];


// some invented function call


Have a look at “cooperative groups”:

You can partition threads into tiles our groups/sub-groups and then synchronizes these.

Regarding the specific problem of copying 5 elements to shared memory, you could just use an if-statement to ensure only the first 5 threads perform a copy operation.

__shared__ float ScratchPad[5];

if(threadIdx.x < 5){