Logic operations and branching Do logic operations in kernel lead to branching

Consider following kernel code:

if ( t > 0 )

   Out[ threadIdx.x ] = t;


   Out[ threadIdx.x ] = 0;

That code obviously leads to branching inside half-warps, but if we modify it to following:

Out[ threadIdx.x ] = t * ( t > 0 );

will branching occur? Or the code will be executed serially? So the question is: do logic operations lead to branching?

Hardware has instructions that allow you to write some simple conditional statements without branching. The compiler will use them whenever possible. Things like



d = (a>0) ? b : c;

typically end up compiled into branchless code. The best way to know if branching occurs is to inspect the output PTX or assembly code.

Many thanks for fast reply. I supposed so, but couldnt find in docs anything about logic operations. By the way, is that true for devices of any compute capability versions?

What is the cost of branching ?

I think the big killer is if one or more threads in a warp execute each branch of the code, as then the MP has to execute both branches. So if a branch contains a lot of code (after inlining etc) that will slow the whole warp down.

I think so; it’s called “branch predication” and it’s been referenced in CUDA programming guides since version 1.0.

Hello all,

Is there an option to disable the branch-predication done by the compiler?

I’m not aware of any of such option, nor do I think there should even be one. Predicated execution is the better way to go when the if else sections are short.

Out[ threadIdx.x ] = t * ( t > 0 );

is actually more expensive for the hardware to execute. Though I suppose the compiler would be smart enough to convert it back to a SETP and two predicated ST