CUDA Warp primitive behaviour question

794906124 · April 12, 2024, 5:46am

Hello all, Now I’m learning the warp level data exchange primitives. I have some question on the mask parameter.

I’m now conducting the intra-warp data exchange with following code, which matches my expected output well:

if ((1 << lane_id) & (uint32_t) 0x5a5a5a5a) {
        *(uint32_t*)(&data[0]) = __shfl_xor_sync(0xffffffff, *(uint32_t*)(&data[0]), 5);
    }

While I want to use the mask parameter to do the same, as:

*(uint32_t*)(&data[0]) = __shfl_xor_sync(0x5a5a5a5a, *(uint32_t*)(&data[0]), 5);

The result indicated that ALL threads in the warp participate the data exchange, while I only want the threads with threadIdx.x % 8 == 1,3,4,6 to participate the data exchange (So I set the mask to be 0x5a5a5a5a (0x5a=0b 0101 1010) which exactly indicate the 1, 3, 4 and 6 thread). How should I use the mask parameter to achieve my expected effect?

Robert_Crovella · April 12, 2024, 3:46pm

The mask doesn’t exclude threads. The mask guarantees convergence to at least the level specified in the mask. But the mask does not exclude threads “not selected” in the mask, nor does it prevent those “not selected” threads from participating in the op.

If you want only certain threads to participate, use a boolean if condition to select those threads, before running the primitive. Make sure your mask is consistent with your conditional selection of threads.

Depending on results not supported by the mask is undefined behavior.
Make sure that both source and necessary destination lanes are included in the mask as well as appropriately selected by your conditional code.

794906124 · April 13, 2024, 2:29am

Thanks for your reply, guess it’s good to use my current solution.

Topic		Replies	Views
What does mask mean in warp shuffle functions (__shfl_sync) CUDA Programming and Performance	2	5540	November 22, 2018
How do mask work in warp primitive? CUDA Programming and Performance	5	332	May 31, 2023
Question regarding the mask in __shfl_sync operations CUDA Programming and Performance	2	1322	January 16, 2020
Warp shuffle over a mask CUDA Programming and Performance	0	492	August 20, 2020
confuse about warp-level mask CUDA Programming and Performance	12	3233	November 1, 2018
Requesting clarification - CUDA WARP level primitives and THREAD divergence CUDA Programming and Performance hw , cuda	3	1159	February 14, 2024
Shfl_sync question CUDA Programming and Performance	1	584	October 12, 2021
[q] mask in __shfl_sync() CUDA Programming and Performance cuda , kernel	5	133	September 13, 2024
Using CUDA Warp-Level Primitives Technical Blog	20	2146	April 15, 2024
Weird behavior about __activemask() CUDA Programming and Performance	4	625	July 18, 2022

CUDA Warp primitive behaviour question

Related topics