Compare and Set operations

I am wondering what compare with set operations are accelerated or recommended.

This is the kind of thing I’m talking about, which method is fastest?

if( x < minX )

{

  x = minX;

}
x = max(x, minX);
x = x < minX ? minX : x;

The above code overwrote one of the values used in the comparison.

Other conditional assignments would be the result of different values. Eg:

if( x < min.x )

{

  clipFlags |= CLIP_LEFT;

}

...etc...
clipFlagLeft = x < min.x ? CLIP_LEFT : 0;

...etc...

clipFlags = clipFlagLeft + clipFlagRight + ...etc...

This interesting thread http://forums.nvidia.com/index.php?showtopic=103046 shows timings of some operations and also mentioned “Predicates are indeed super-efficient! A test like a>0.0 ? 1 : 0 can be done in just 4 clocks. Everyone’s always said that, but it’s great to see the actual clock measurements”.

[i]Does the hardware perform branch free compare and set operations?

If so, is there a difference between integer and floating point comparisons?

Does the compiler automatically uses these operators without the user writing ternary expressions?

Is there any way to accelerate vector compare and set operations in a single thread?[/i]

I’d like to hear more about this topic and performance figures.

Your question is pretty much answered by that thread you referenced pointing to my throughput tool.
I updated it this week to give proper normalized clock counts. You’ll see that predicates are really one clock! As is the floating point min() function.

But you can download that tool and make absolute measurements for each one of your tests you listed. min(a,B) will likely be a one-clock answer and the winner of the three options you list.
Even though predicates are cheap and one-clock, you’d still spend another clock doing the a>b test.

There’s also an interesting puzzle in that timing list… why are integer min(a,B) and max(a,B) effectively free? In fact, more than free as in they showed that they take the less time as just a or b alone.
That may be a measurement error (but unlikely since it is repeatable, and shows the same value on two GPUs in two OSes), or some funky architecture issue, or just my own dumb code.