Warp shuffle over a mask


Is there a way to do shuffle over a subset of a warp?

E.g. I have an if statement that splits the threads in a warp and I want to do shuffle within an if statement. Like in the following pseudocode:

//kernel code
if (condition depending on threadId) {
 mask = _ballot(condition);
 int value = ... // some value
 value = __shufle...(mask,value) //some aggregation, e.g. minimum

I thought that mask parameter in shuffle guarantees that only threads from the mask participate in aggregation, however it seems to just synchronize the threads from the mask at the point of the call to shuffle. The thing I want to do, is to find a minimum value among the threads in the mask:
the naive approach doesn’t work:

for (int offset = 1; offset < 32; offset *=2)
     value = min(value,__shfl_xor_sync(mask, value, offset));

threads appear to have different values, and the values of the threads outside the brach happen to be zero. Is there a way to do shuffle across the threads from the mask? (I think it is possible to calculate the laneId of the next thread in the mask and the offset to it, however this appoach seems to be hard to generalize for multiple iterations)