Minimum of 2-5 floats


I need to find the minimum/maximum of 2/3/4/5 floats in code. I’m wondering if there’s a performance difference in the following implementations (the example is for the minimum of two floats):-

PS. I’m worried about divergent branches, Will the compiler optimize these things out?

PS. It can be assumed that the functions will be called many times.

__device__ inline float min2(float a, float b){

        return ((a>b)*b + (b>=a)*a);

__device__ inline float min2(float a, float b){

       if(a > b) 

        {return b;}


        {return a;}

__device__ inline float min2(float a, float b){

     return ((a>b)?b:a);



why are you not using fminf (which is part of CUDA)?

Ok, is that native?

The quickest way to answer these questions would be to use the -ptx option to nvcc to output the PTX assembly for each of these cases. Then you can see what code the compiler generates directly.

Thanks! It turns out that fminf is indeed the best way to find the minimum. It compiles to the native min.f32 instruction. All others use set-greater-than instruction in conjunction with several others.

Pleasantly surprising is the fact that even the if statement compiles without branches.