Forcing predicate Would this guarantee it?

trex · June 20, 2008, 11:31pm

Hi

Is there a way to guarantee that the CUDA compiler will use prediction rather than serialisation? i.e. would the following always be done with prediction;

if(condition) variableA = variableTemp;

I don’t know how to test this so I’m having to ask.

Cheers. :D

BarsMonster · June 21, 2008, 1:09am

What prediction you are talking about? CUDA is not superscalar.

trex · June 21, 2008, 8:32am

Search ‘predicate’ in the user manual.

It says there is a threshold on the number of instructions allowed for prediction, but I don’t see how to test to make sure prediction is being used rather than serialisation.

BarsMonster · June 21, 2008, 9:31am

I see…

I would compare performance of the following pieces of code (predicate is always true but it should not be optimized):

for(int i=0;i<1000000;i++)

{

  if(predicate)

  {

   a+=b;

   b^=c;

   c-=a;

  }

   a+=b;

   b^=c;

   c-=a;

}

for(int i=0;i<1000000;i++)

{

  if(predicate)

  {

   a+=b;

   b^=c;

  }

   c-=a;

   a+=b;

   b^=c;

   c-=a;

}

for(int i=0;i<1000000;i++)

{

  if(predicate)

  {

   a+=b;

  }

   b^=c;

   c-=a;

   a+=b;

   b^=c;

   c-=a;

}

and return a to the host to avoid optimization.

E.D_Riedijk · June 21, 2008, 11:20am

I don’t think there is a way to force predication. Maybe you can do so in ptx. I believe there is some mention in the Programming Guide about predication, that seems to indicate predication is use when the code path is not too long. Otherwise it is serialized.

BarsMonster · June 21, 2008, 8:11pm

So the question is how long “not so long” :-)

Reimar · June 24, 2008, 2:17pm

Well, decuda should be a quite sure way to find out, though I guess this is likely to depend on the architecture version (i.e. change with future graphic cards).

The decuda output uses the “@$pn.cond” prefix for this I think (where pn is the predicate register and cond is the condition on which the instruction behind this is executed).

I do not see any way to influence/see this in ptx, though that is no surprise since IMO the limit is likely to depend on the specific GPU (at least in the future),and the GPU-specific stuff is done in ptxas.

And please take care not to write “prediction” when you mean “predication”, while I am not sure if the later is really a valid word, the former already has a meaning, and a very different one.

Topic		Replies	Views
Predicate propagation CUDA Programming and Performance	3	1345	January 5, 2010
Predication & do ... while loops What happens with this sample loop ... CUDA Programming and Performance	2	4150	June 4, 2008
What are the limits on predication? CUDA Programming and Performance	2	1484	February 1, 2018
Branch Predication in CUDA Any example in SDK CUDA Programming and Performance	1	1376	December 25, 2009
branching and SIMD processor serialization vs predication CUDA Programming and Performance	7	10867	October 26, 2007
Predicated Execution Cannot replicate example in PTX ISA 2.1 reference document... CUDA Programming and Performance	4	2409	July 13, 2010
[Solved] PTX ISA predicated execution and the warp divergence issue CUDA Programming and Performance	6	3229	January 14, 2014
What methods are there to tell the compiler that an if statement is likely to be false, thus compiling more optimized code for better performance? CUDA NVCC Compiler	4	282	April 24, 2025
Texture fetch and predication CUDA Programming and Performance	1	955	April 10, 2013
How to pack predicate registers to regular register efficiently? CUDA Programming and Performance	0	567	May 12, 2019

Forcing predicate Would this guarantee it?

Related topics