Hi,
is it possible to synchronize just the first N threads of a block ??
For example, if i just need to copy 5 values into a scratchpad, it would be nice if
just the first 5 threads are executed for the first copy an all others should be discarded.
__shared__ float ScratchPad[5];
ScratchPad[threadIdx.x]=GlobalMemSrc[threadIdx.x]
// some invented function call
__syncthreads(5);
...
Have a look at “cooperative groups”: https://devblogs.nvidia.com/cooperative-groups/
You can partition threads into tiles our groups/sub-groups and then synchronizes these.
Regarding the specific problem of copying 5 elements to shared memory, you could just use an if-statement to ensure only the first 5 threads perform a copy operation.
__shared__ float ScratchPad[5];
if(threadIdx.x < 5){
ScratchPad[threadIdx.x]=GlobalMemSrc[threadIdx.x]
}
__syncthreads();