Is there efficient way to deal with if/else in the kernel

enjoygpu · June 13, 2009, 8:15pm

Hi,
From CUDA programming guide,if threads of a warp diverge via a data dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete, the threads converge back to the same execution path.
I want to know is there some efficient way to deal with this case?

Thanks.

Yixun

seibert · June 13, 2009, 9:36pm

If you can structure your algorithm so threads in the same warp are likely to take the same branch, that will help. (Often that isn’t possible, but I though I would mention it.)

cvnguyen · June 13, 2009, 10:51pm

In addition, there may be no warp divergence for the conditional assignment:

var = (cond)? val1 : val2;

_Big_Mac · June 14, 2009, 2:55pm

I believe the above is just syntactic sugar for the corresponding if/else. At least in CUDA.

However, both a short if/else and such ?: expression will likely generate code with predicated execution, which doesn’t hurt as much as true branching (jumping). Remember that most programs in CUDA are memory bound and you have GPU cycles to spare. Unless you’re doing conditional memory accesses or have truly a large amount of code within condition blocks, the loss of speed will likely be completely hidden under global memory operations. So, for small divergences - don’t worry unless you really measure performance drops.

Additionally, there are non-branching versions of certain algorithms or functions. There’s branchless clamp-to-bounds for example using built-in min/max functions (unless device min/max are branchy?) or a built-in branchless abs(). It’s usually smart to use them but if you ever find yourself designing branchless routines with ugly bit-hacks, step back and profile because small divergence isn’t usually as bad as it seems.

cvnguyen · June 14, 2009, 9:50pm

There is a CUDA assembly instruction to do [font=“Courier New”]= () ? :[/font], but it is not always applicable/efficient.

Topic		Replies	Views
Question about divergent branching CUDA Programming and Performance	3	6425	May 21, 2009
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2925	December 28, 2008
Avoid branching ... CUDA Programming and Performance	3	3598	May 19, 2010
Shift direction and divergence CUDA Programming and Performance	7	379	November 13, 2020
Thread divergence when block size is equal to warp size CUDA Programming and Performance	2	596	June 5, 2019
Thread divergence due to IF CUDA Programming and Performance	3	6853	September 13, 2007
Branching in kernel CUDA Programming and Performance	3	5299	June 5, 2008
Performance of Divergent Threads CUDA Programming and Performance	2	1632	September 29, 2008
If loops in kernel a problem? CUDA Programming and Performance	3	1743	February 26, 2009
Wacking the CUDA performance Is this how you can screw up you CUDA CUDA Programming and Performance	16	21233	March 12, 2007

Is there efficient way to deal with if/else in the kernel

Related topics