min max and sign functions in CUDA do they exist? if so where?

Gorune · February 25, 2012, 7:18pm

Hello, I was looking for min, max and sign functions in CUDA, but I found nothing. I am using the very basic macro definitions for them and I don’t like it much, because they could lead to warp divergence

Now with Fermi there is something called predication, which I don’t understand well, that supposedly takes care of such small divergence, what I understood is that it only takes care of branching overhead, meaning threads in the warp do both branches without branching overhead, however this contrasts with just a single function call that involves no branching (or is done in hardware like addition).

So what I am asking, does there exist functions for min, max and sign in CUDA that do not involve branching or are implemented in the most efficient way possible? If so how can I use them, which header file are they in?

I can imagine a function for sign that can be implemented on hardware, it could just return the sign bit or something. I am not so sure about min and max, but regardless if CUDA implements some version of min, max or sign, I would like to know.

Thanks

RezaRob3 · February 25, 2012, 9:35pm

See Appendix C(Mathematical Functions) in programming guide. Overloaded min/max, fminf etc. are available.

There is signbit, but you can get the regular sign with standard C operators:

bool sign_a_is_negative = a<0;

Gorune · February 25, 2012, 10:13pm

Thanks, I had had a look in the guide, but had not found the functions. fmin and fmax are just mentioned in F.2 in the programming guide v4.1.

External Image

njuffa · February 26, 2012, 4:38am

The section “Mathematical Functions” in the online documentation lists all standard math functions supported by CUDA:

http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/modules.html

In general, CUDA supports the full set C99 standard math functions, plus various common extras (e.g. sincos, exp10, rsqrt, j0, j1, jn, y0, y1, yn). The online help does not seem to mention overloaded functions like min() and max() which CUDA supports for just about any scalar type.

The GPU hardware supports integer and floating-point min / max operations directly via dedicated instructions. The handling of special operands in the floating-point variants follows the IEEE-754(2008) standard, in particular a min / max operation in which exatcly one operand is a NaN returns the non-NaN operand; this sometimes comes as a surprise to programmers.

Gorune · February 26, 2012, 11:40am

That’s exactly what I wanted to hear.

Now I see that there is a different function for single precision min/max and double precision min/max, fminf fmaxf fmin and fmax respectively. I am using a typedef real which can be either a float or a double, and I want to apply max and min on variables of type real, so which function should I use? If I use the double precision, and my real is a float, then perhaps the compiler can cast the float to a double, but then the double precision is sure to take more cycles, and if I use the single precision function then if my real is a double, I would lose precision.

I could be wrong with this analysis and using the double precision function would cost just as much as the single, but if I’m not how can I pick the right function to use? I could use macros, but I would like to avoid that if possible, or use if statements, which kind of defeats the purpose. Any suggestions?

njuffa · February 26, 2012, 3:47pm

Thanks to overloading, you can use the generic function name with both float and double arguments. I do not offhand know where the following is documented these days, but when double-precision support was first added to CUDA it was mentioned in the release notes:

http://developer.download.nvidia.com/compute/cuda/2_0/docs/CUDA_Toolkit_Release_Notes_linux_2.0.txt
Note that math functions in the CUDA math library are overloaded.
In general, there are three prototypes for each math function:
(1) double (double), e.g. double log(double)
(2) float (float), e.g. float log(float)
(3) float f(float), e.g. float logf(float)

Gorune · February 27, 2012, 8:59pm

Amazing! using fmin and fmax cut my computation time from 4.1-4.2 ms to 3.1-3.2, and their use isn’t the major part of the computations!

This was a real help. External Image to you all.

Topic		Replies	Views
max( a, b ) ( ((a) > (b)) ? (a) : (b) ) in cuda? how to use above c function in cuda CUDA Programming and Performance	8	20740	April 2, 2011
Are max(a, b) and min(a, b) divergent? CUDA Programming and Performance	7	5739	June 16, 2011
sign() function CUDA Programming and Performance	22	13538	August 24, 2010
Min/Max functions missing? CUDA Programming and Performance	7	6252	February 14, 2011
Extract the sign of a float CUDA Programming and Performance	1	1253	August 6, 2009
Sign function CUDA Programming and Performance	2	10727	October 24, 2009
MAX: MACRO or CUDA functions? CUDA Programming and Performance	3	2652	February 7, 2012
min/max/step performance? Will they cause branches? CUDA Programming and Performance	2	9981	February 24, 2011
0.8.1, error: identifier "Min" is undefined math standard library function CUDA Programming and Performance	0	2522	April 21, 2007
MAX : MACRO or CUDA function? CUDA Programming and Performance	1	959	February 7, 2012

min max and sign functions in CUDA do they exist? if so where?

Related topics