Question about divergent branching

bluebit · May 20, 2009, 8:02am

Hey guys

Lets say I have an if-else in my kernel:

if (A)
{
x = 5;
}
else
{
x = 15;
}

Basically, how bad is the performance hit due to the above branching? Do threads rebranch and execute in parallel over a warp after the if-else block is finished? If not, do they run in serial until the end of execution?

I have realised when it comes to branching that only contains memory modification (as in the example above), one can remove the branching altogether with clever bit logic:

A1 = set_all_bits(A); // set all bits in A1 to the bit in A (which is either true or false - 0 or 1)
x = (5&A1)|(15&!A1);

Thoughts?

Tobi_W · May 20, 2009, 8:51am

What you are doing manually is done by the compiler too. I think you won’t get a performance boost with your ‘optimized code’.

Secondly, all threads within a warp execute always the same instruction. If threads within a warp diverge, all threads which do not execute the current instruction get simply disabled. Therefore, after your if-statement, all threads within the warp are executed in parallel again.

Jamie_K · May 20, 2009, 12:55pm

For very short branches, it is not worth it to “optimize” in this way. You will probably not make it faster and you could make it slower.

The compiler will re-merge the divergent branches as soon as it can. And for some things, like your example, there is not even any branch at all. It uses the “selp” instruction.

gatoatigrado · May 21, 2009, 9:20pm

That specific code might not branch. If you have an atomic op, it will likely branch (not that removing the conditional will necessarily help). Other short “if” stmts might get compiled into guards (see the ptx guide 4.3.2).

Topic		Replies	Views
Is there efficient way to deal with if/else in the kernel CUDA Programming and Performance	4	13885	June 14, 2009
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2939	December 28, 2008
Thread divergence due to IF CUDA Programming and Performance	3	6853	September 13, 2007
If loops in kernel a problem? CUDA Programming and Performance	3	1744	February 26, 2009
Branching in kernel CUDA Programming and Performance	3	5313	June 5, 2008
About if-else between warps CUDA Programming and Performance	2	390	July 26, 2023
reduction optimization #1 Not agree with performances explanation CUDA Programming and Performance	8	6664	August 1, 2008
Will threads in a warp merged again after divergence happened? CUDA Programming and Performance	3	415	February 10, 2023
performance gain by "killing" warps can there be any? CUDA Programming and Performance	5	2268	February 12, 2009
Avoid branching ... CUDA Programming and Performance	3	3601	May 19, 2010

Question about divergent branching

Related topics