Shfl_sync question

I uploaded to the local variable x in registers of threads 1 to 20 some value now i need to do the calculations where all those variables are needed
Then i execute syncwarp() to make sure all is loaded

Calculations are simple yet sequential so it needs to be done by a single lane in a warp

Now most logical way to access data is a shfl_sync

As the documentation states i need to set mask, variable that i want to load and lane of the thread i am interested in

And here i have two questions
First mask is needed for synchronization but at this step i do not need any synchronization i just need to load value i already made sure it is there by calling sync warp so can I safely set mask to 0 ?

Also i experimented with shfl_sync and when I want for example data from lane 10 and i call for experiment shfl_sync(FULL_MASK, x, 10) on every lane and then print values i got correct values only in lane 10 and 20 and no other, although i am sure i am calling it from the same warp

What is my mistake?

It is a category of undefined behavior if the mask passed to a shuffle sync operation does not match the threads that are required to participate for correctness. Threads that are required to participate are threads that are either at the source lane or the destination late of any transfer that must happen for correctness.

A mask of zero is not every likely to be useful for a shuffle sync operation, if any actual results from that operation are expected to be used.

For your remaining question I suggest providing a short, complete test case.

1 Like