Synchronize just first N threads of a block ?

Hi,

is it possible to synchronize just the first N threads of a block ??

For example, if i just need to copy 5 values into a scratchpad, it would be nice if
just the first 5 threads are executed for the first copy an all others should be discarded.

__shared__ float ScratchPad[5];

ScratchPad[threadIdx.x]=GlobalMemSrc[threadIdx.x]

// some invented function call
__syncthreads(5);

...

Have a look at “cooperative groups”: https://devblogs.nvidia.com/cooperative-groups/

You can partition threads into tiles our groups/sub-groups and then synchronizes these.

Regarding the specific problem of copying 5 elements to shared memory, you could just use an if-statement to ensure only the first 5 threads perform a copy operation.

__shared__ float ScratchPad[5];

if(threadIdx.x < 5){
   ScratchPad[threadIdx.x]=GlobalMemSrc[threadIdx.x]
}
__syncthreads();