Question about divergent branching

Hey guys

Lets say I have an if-else in my kernel:

if (A)
{
x = 5;
}
else
{
x = 15;
}

Basically, how bad is the performance hit due to the above branching? Do threads rebranch and execute in parallel over a warp after the if-else block is finished? If not, do they run in serial until the end of execution?

I have realised when it comes to branching that only contains memory modification (as in the example above), one can remove the branching altogether with clever bit logic:

A1 = set_all_bits(A); // set all bits in A1 to the bit in A (which is either true or false - 0 or 1)
x = (5&A1)|(15&!A1);

Thoughts?

What you are doing manually is done by the compiler too. I think you won’t get a performance boost with your ‘optimized code’.

Secondly, all threads within a warp execute always the same instruction. If threads within a warp diverge, all threads which do not execute the current instruction get simply disabled. Therefore, after your if-statement, all threads within the warp are executed in parallel again.

For very short branches, it is not worth it to “optimize” in this way. You will probably not make it faster and you could make it slower.

The compiler will re-merge the divergent branches as soon as it can. And for some things, like your example, there is not even any branch at all. It uses the “selp” instruction.

That specific code might not branch. If you have an atomic op, it will likely branch (not that removing the conditional will necessarily help). Other short “if” stmts might get compiled into guards (see the ptx guide 4.3.2).