Using only ( *,+ and - ) on 32 bit floats under what circumstances will NaNs and INFs be generated?

Already Googled this but want to find out if nvcc may have different behavior;

Basically I am debugging some open-source CUDA code(not my own) and I found that there are multiple bugs, one of which generates both NaNs and INFs from a simple float atomicAdd() like this;

float a=arr_1[idx];

At this point I guess it could be caused by summations which exceed the possible values representable by a 32 bit float, but wanted to know if there are any other possible cases.

Did check the inputs which are ok, and that above section of code is where the problem occurs.

Just to add I did add some debug features to count the number of times that the atomicAdd() is invoked and that number is low, so it is generating the bad values some other way.
Also checked that the values loaded are not INFs or NaN or large values as well.

Running tests without using ‘fast_math’ flag, which does not make a difference.

After more testing I found that some earlier divisions are made with very small denominators, so it seems this issue is related to certain types of inputs.

But just to check are there any other operations in CUDA which could cause NANs other than the below?

Taken from wikipedia -> special values -> nan


acosf(x) returns NaN for x outside [-1, +1].

acoshf(x) returns NaN for x in the interval [ − ∞ , 1)

asinf(x) returns NaN for x outside [-1, +1].

..and all other other similar trig functions

+inf / +inf does indeed make minus NaN

The GPU hardware operations follow IEEE 754-2008 with respect to the handling of special cases. The CUDA standard math functions follow the requirements of ISO C99 in this regard (the CUDA math library predates the extension of the C++ standard math library and therefore was the only model available when CUDA functionality was defined; C++ likewise simply adopted C99 specifications). Special case handling for CUDA math functions that are no part of the standard C/C++ math library are specified analogous to the standard functions, e.g. norm3d() is based on hypot().

Define special case handling for all operations is covered by tests, and I do not recall any bugs ever reported against this functionality.

Obviously, if you turn on FTZ mode or use device intrinsics instead of the standard library functions, speed is prioritized over special case handling, so you “get what you get”. Note that -use_fast_math implies both use of FTZ and device instrinsics.

As far as additions and subtractions are concerned, NaN will result for (+INF) + (-INF), (-INF) + (+INF), (+INF) - (+INF), (-INF) - (-INF). For multiplications, (+/-INF) * (+/- 0) will result in NaN, for divisions, (+/-INF) / (+/-INF) and (+/-0) / (+/-0).

Note that the question on Stackoverflow involves calls to pow(), and that standard math function has the largest number of special cases of any math function, I seem to recall even more than atan2(). The ISO-C standard, or one of the final drafts thereof available on the internet for free, enumerates these cases in detail.

FWIW, that Stackoverflow question is only the second time in my live I have seen someone call pow (2.71828183,x) instead of exp (x). Kids, don’t try this at home!