Lets say I have an if-else in my kernel:
x = 5;
x = 15;
Basically, how bad is the performance hit due to the above branching? Do threads rebranch and execute in parallel over a warp after the if-else block is finished? If not, do they run in serial until the end of execution?
I have realised when it comes to branching that only contains memory modification (as in the example above), one can remove the branching altogether with clever bit logic:
A1 = set_all_bits(A); // set all bits in A1 to the bit in A (which is either true or false - 0 or 1)
x = (5&A1)|(15&!A1);