Ternary operators and branching

andr · April 30, 2009, 7:33pm

Heya,

I’m looking for some pointers to understand how exactly branching works and how to avoid it so that the GPUs don’t start executing threads sequentially.

I need to solve two quandaries right now:

I have a statement using a ternary operator: (x > 0.5) ? 1 : ((x < -0.5) ? -1 : 0). Would that branch? How can I rewrite it so that it doesn’t branch?
If I have a conditional branch which executes every 1000th iteration of a kernel loop, that would branch, but would the GPU be smart enough to merge the different threads after the branch has finished? I’m talking about something like this:

for(int i = 0; i < 1000000; i++) { 

	if (i % 1000 == 0) {

		counter++;

	}

	// rest of the loop. Ideally threads would merge here even if the condition matched for one of the threads.

}

tmurray · April 30, 2009, 7:57pm

To answer number two (and possibly number one as well), sure. Branching is done via predication, so you’re still effectively executing an entire warp when you have a divergent branch, you’re just masking out some number of threads from having any effect (e.g., don’t write to registers, don’t load, don’t store, don’t set any error conditions). So when you branch, the predication mask will be set, some threads will not be executed, and then the branch will end and the predication mask will be cleared. Voila, your warp is back to executing normally.

Very short divergent branches usually aren’t a huge deal.

jaredkeithwhite · May 3, 2009, 7:59am

I have a follow up question to the above. I’m a little unclear on the following: What if all of the threads in the warp evaluate the conditional to be true, or all of them evaluate false? Would this cause branching?

Paul_Russell · May 3, 2009, 9:31am

ITYM: “Would this cause divergence?”

If all threads evaluate true (or false) then the branch is not a problem - there is no divergence. Divergence occurs when some threads evaluate the condition as true and some as false, in which case both branches must be executed (using predication to mask operations in the relevant threads), and hence you get a performance hit.

Topic		Replies	Views
Is there efficient way to deal with if/else in the kernel CUDA Programming and Performance	4	14150	June 14, 2009
Branching in kernel CUDA Programming and Performance	3	5361	June 5, 2008
Question about divergent branching CUDA Programming and Performance	3	6463	May 21, 2009
About divergent warps CUDA Programming and Performance	3	1616	September 22, 2009
Evaluation of complex conditions Do threads diverge ? CUDA Programming and Performance	1	2748	August 24, 2008
Branch divergence and executing serial could be misinterpretted. CUDA Programming and Performance	8	3999	December 21, 2016
How many divergent branches can actually be discussed in parallel? CUDA Programming and Performance	5	3063	October 1, 2009
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2991	December 28, 2008
Warp branching CUDA Programming and Performance	11	10319	October 26, 2010
[Solved] PTX ISA predicated execution and the warp divergence issue CUDA Programming and Performance	6	3100	January 14, 2014

Ternary operators and branching

Related topics