Hardware has instructions that allow you to write some simple conditional statements without branching. The compiler will use them whenever possible. Things like
if(a>b)
c=d;
or
d = (a>0) ? b : c;
typically end up compiled into branchless code. The best way to know if branching occurs is to inspect the output PTX or assembly code.
Many thanks for fast reply. I supposed so, but couldnt find in docs anything about logic operations. By the way, is that true for devices of any compute capability versions?
I think the big killer is if one or more threads in a warp execute each branch of the code, as then the MP has to execute both branches. So if a branch contains a lot of code (after inlining etc) that will slow the whole warp down.
I’m not aware of any of such option, nor do I think there should even be one. Predicated execution is the better way to go when the if else sections are short.
Out[ threadIdx.x ] = t * ( t > 0 );
is actually more expensive for the hardware to execute. Though I suppose the compiler would be smart enough to convert it back to a SETP and two predicated ST