I wasn’t very clear when I said you don’t really need temp_val
. What I meant was you could do something like:
for (int offset = 16; offset > 0; offset >>= 1)
val = min(val, __shfl_down_sync(0xFFFFFFFF, val, offset));
But its not necessarily “better” than what you have.