min/max/step performance? Will they cause branches?

Hello together,

I was just wondering how NVidia GPUs generally handle built in functions such as min/max and step. Will they cause branches (i.e. divergeant warps) or do they map to a native instruction that is supported by the GPU?



If they dont map to at least a very short series of predicated instructions I would be very upset…:-)

Although watch out, at least in the 1.0 OpenCL driver NVIDIA seem to have forgotton to impliment or impliment as a crash(!) some of the built in function. In particular there is no overload for clamp(int, int, int) and sign(float) crashes the compiler… Havnt gotton around to testing this with 1.1 yet(at which point I guess I will file a bug report).


Thanks for your quick answer.

I only need “uint min(uint,uint)” which I tested and runs fine on my machine (9800 GT) but these issues are good to know.

I asked because with this:

uint circularIdx = min( (uint)(idx - _length), idx );

I can save a modulo operation in my case, the maximum value of idx is _length*2-1.