Hello all, Now I’m learning the warp level data exchange primitives. I have some question on the mask parameter.
I’m now conducting the intra-warp data exchange with following code, which matches my expected output well:
if ((1 << lane_id) & (uint32_t) 0x5a5a5a5a) {
*(uint32_t*)(&data[0]) = __shfl_xor_sync(0xffffffff, *(uint32_t*)(&data[0]), 5);
}
While I want to use the mask parameter to do the same, as:
*(uint32_t*)(&data[0]) = __shfl_xor_sync(0x5a5a5a5a, *(uint32_t*)(&data[0]), 5);
The result indicated that ALL threads in the warp participate the data exchange, while I only want the threads with threadIdx.x % 8 == 1,3,4,6 to participate the data exchange (So I set the mask to be 0x5a5a5a5a (0x5a=0b 0101 1010) which exactly indicate the 1, 3, 4 and 6 thread). How should I use the mask parameter to achieve my expected effect?