Is there efficient way to deal with if/else in the kernel

From CUDA programming guide,if threads of a warp diverge via a data dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete, the threads converge back to the same execution path.
I want to know is there some efficient way to deal with this case?



If you can structure your algorithm so threads in the same warp are likely to take the same branch, that will help. (Often that isn’t possible, but I though I would mention it.)

In addition, there may be no warp divergence for the conditional assignment:

var = (cond)? val1 : val2;

I believe the above is just syntactic sugar for the corresponding if/else. At least in CUDA.

However, both a short if/else and such ?: expression will likely generate code with predicated execution, which doesn’t hurt as much as true branching (jumping). Remember that most programs in CUDA are memory bound and you have GPU cycles to spare. Unless you’re doing conditional memory accesses or have truly a large amount of code within condition blocks, the loss of speed will likely be completely hidden under global memory operations. So, for small divergences - don’t worry unless you really measure performance drops.

Additionally, there are non-branching versions of certain algorithms or functions. There’s branchless clamp-to-bounds for example using built-in min/max functions (unless device min/max are branchy?) or a built-in branchless abs(). It’s usually smart to use them but if you ever find yourself designing branchless routines with ugly bit-hacks, step back and profile because small divergence isn’t usually as bad as it seems.

There is a CUDA assembly instruction to do [font=“Courier New”]= () ? :[/font], but it is not always applicable/efficient.