I uploaded to the local variable x in registers of threads 1 to 20 some value now i need to do the calculations where all those variables are needed
Then i execute syncwarp() to make sure all is loaded
Calculations are simple yet sequential so it needs to be done by a single lane in a warp
Now most logical way to access data is a shfl_sync
As the documentation states i need to set mask, variable that i want to load and lane of the thread i am interested in
And here i have two questions
First mask is needed for synchronization but at this step i do not need any synchronization i just need to load value i already made sure it is there by calling sync warp so can I safely set mask to 0 ?
Also i experimented with shfl_sync and when I want for example data from lane 10 and i call for experiment shfl_sync(FULL_MASK, x, 10) on every lane and then print values i got correct values only in lane 10 and 20 and no other, although i am sure i am calling it from the same warp
What is my mistake?