Logic operations and branching Do logic operations in kernel lead to branching

maggotroot · April 17, 2011, 10:19pm

Consider following kernel code:

if ( t > 0 )

   Out[ threadIdx.x ] = t;

else

   Out[ threadIdx.x ] = 0;

That code obviously leads to branching inside half-warps, but if we modify it to following:

Out[ threadIdx.x ] = t * ( t > 0 );

will branching occur? Or the code will be executed serially? So the question is: do logic operations lead to branching?

hamster143 · April 17, 2011, 10:34pm

Hardware has instructions that allow you to write some simple conditional statements without branching. The compiler will use them whenever possible. Things like

if(a>b)
c=d;

or

d = (a>0) ? b : c;

typically end up compiled into branchless code. The best way to know if branching occurs is to inspect the output PTX or assembly code.

maggotroot · April 17, 2011, 10:38pm

Many thanks for fast reply. I supposed so, but couldnt find in docs anything about logic operations. By the way, is that true for devices of any compute capability versions?

kbam · April 17, 2011, 11:25pm

What is the cost of branching ?

I think the big killer is if one or more threads in a warp execute each branch of the code, as then the MP has to execute both branches. So if a branch contains a lot of code (after inlining etc) that will slow the whole warp down.

hamster143 · April 17, 2011, 11:28pm

I think so; it’s called “branch predication” and it’s been referenced in CUDA programming guides since version 1.0.

ayaseen · June 23, 2011, 6:41pm

Hello all,

Is there an option to disable the branch-predication done by the compiler?

hyqneuron · June 25, 2011, 9:43am

I’m not aware of any of such option, nor do I think there should even be one. Predicated execution is the better way to go when the if else sections are short.

Out[ threadIdx.x ] = t * ( t > 0 );

is actually more expensive for the hardware to execute. Though I suppose the compiler would be smart enough to convert it back to a SETP and two predicated ST

Topic		Replies	Views
Is there efficient way to deal with if/else in the kernel CUDA Programming and Performance	4	14291	June 14, 2009
Branching Performance Hit CUDA Programming and Performance	15	2874	June 30, 2009
Ternary operators and branching CUDA Programming and Performance	3	9147	May 3, 2009
Question about divergent branching CUDA Programming and Performance	3	6501	May 21, 2009
Cost of bra instruction CUDA Programming and Performance	8	7998	January 14, 2010
Ternary operator in device code CUDA Programming and Performance	4	5872	March 17, 2009
[Solved] PTX ISA predicated execution and the warp divergence issue CUDA Programming and Performance	6	3234	January 14, 2014
Branch or not CUDA Programming and Performance	7	3226	February 28, 2018
branchless exchange based on condition ? CUDA Programming and Performance	1	1039	February 9, 2009
About divergent warps CUDA Programming and Performance	3	1668	September 22, 2009

Logic operations and branching Do logic operations in kernel lead to branching

Related topics