static constexpr quiet_NaN for device code

svennevs · January 15, 2018, 12:13am

I was wondering if anybody sees any glaring problems with doing the following:

template <>
struct numeric_limits<float> {
    __host__ __device__ __forceinline__
    static constexpr float quiet_NaN() {
        #if defined(__CUDA_ARCH__)
            return __builtin_nanf("");// <<< this line
        #else
            return std::numeric_limits<float>::quiet_NaN();
        #endif
    };
    // ... some other methods ...
};

I only found out about __builtin_nanf by looking at the STL headers to see how they were creating a constexpr quiet NaN, so I don’t really know if I should use it or if I should go find the macro itself and expand it directly.

I saw some posts online showing that you could do return __int_as_float(0x7fffffff); for the device code, but then you cannot be a constexpr function.

Local tests indicate this works as expected, but I’m wondering if there is a time this could be a problem. I saw this article: Intel Developer Zone

I’m only supporting VS 2015 and higher, and honestly don’t really care too much if the intel compilers work or not.

I tried to find a way to do some kind of static union with a float and int32_t but couldn’t find a way to actually get it to compile. You can’t static_cast in this scenario, and you can’t do reinterpret_cast either, since it’s constexpr.

Clearly, I can just not do constexpr, but I became rather intrigued by the difficulty of creating NaN and thought it was kind of cool.

Thanks for any thoughts :)

njuffa · January 15, 2018, 3:43am

I don’t have any insights into constexpr, but if you just want a QNaN in device code, invoking the standard math library function nanf(“”) will give you one. Since GPUs use a single canonical QNaN for single precision, that is what you would get: __int_as_float(0x7fffffff).

One traditional way to generate a NaN is to write b[/b] and let the compiler’s constant propagation work its magic. This approach is a bit brittle. Some compilers will complain about it, some will keep the division (since they don’t apply constant propagation to exceptional computation), others will give you a NaN as intended.

I am probably old-fashioned, but I have never had any use for the numeric_limits constants while dealing with floating-point code for the past thirty years.

svennevs · January 15, 2018, 4:29am

Yeah, the __int_as_float is (reasonably!) not constexpr. I am wary of the 0 / 0 approach, for the reasons you mention xD

In c++11 the numeric_limits constants became constexpr, so I was writing some simple wrappers to try and follow this exactly, and quiet_NaN was the only one I wasn’t sure about being legitimate. I don’t think there’s a “purist” way to make it constexpr with cuda, other than just compiling the code with –expt-relaxed-constexpr (which then I would just use std::numeric_limits::quiet_NaN(), and it also works).

Hehe. The one that might be relevant for typical code is epsilon for doing floating point comparisons: std::numeric_limits<T>::epsilon - cppreference.com , though I never actually use those. E.g. for ray tracing you typically do a much “bigger” epsilon to account for accumulated error etc

In my own use case, it’s partially to support a third party library where bad measurements are marked as NaN. It’s for 3D range data, and I had originally used (0, 0, 0), but this can result in some processing algorithms over-fitting. So it seems that deliberately introducing NaN, assuming all other code is always checking for it, is a reasonable decision.

I’m still uncomfortable with it, because the last thing I thought I’d ever be doing is deliberately introducing NaN!!!

Thanks for your response though :)

svennevs · January 15, 2018, 4:56am

I dug around a little more, I guess instead of relying on __builtin_nanf, if I really want it i should just use the full definition

$ grep _QNAN_F -r /usr/local/Cellar/llvm/5.0.0/include/c++/v1/
/usr/local/Cellar/llvm/5.0.0/include/c++/v1//support/ibm/limits.h:17:static const unsigned int _QNAN_F = 0x7fc00000;
/usr/local/Cellar/llvm/5.0.0/include/c++/v1//support/ibm/limits.h:18:#define NANF (*((float *)(&_QNAN_F)))

Why these are equivalent is beyond my understanding of floating point though (0x7fc00000 vs 0x7fffffff).

njuffa · January 15, 2018, 7:20am

Remember that IEEE-754 just defines which bits classify NaNs (and which bit distinguishes SNaNs from QNaNs). All other bits are implementation defined. The 0x7fc00000 encoding you see in the code is from the x86 world: a special QNaN known as INDEFINITE. Not every processor is a x86 processor, though.

I am very surprised to see the use of type punning by pointer casting in the definition of NANF, that introduces undefined behavior by any reading of the ISO C++ standard I have ever seen. It is the reason CUDA offers device function intrinsics for such reinterpretation, which accomplish the same objective in a well-defined manner, without introducing any overhead.

CUDA also defines a constant CUDART_NAN_F, but I am not sure this is globally visible. It is not officially part of the CUDA language and it again maps back to __int_as_float(0x7fffffff). I admit that I am not up to speed on the advantage of making the QNaN provided by numeric_limits a constexpr.

Using NaNs to indicate invalid or missing input data is actually one of the original motivations for the introduction of NaNs, in addition to producing NaNs as the output of invalid operations. While CUDA propagates NaNs in accordance with IEEE-754 (modulo the fact that there are no SNaNs), this is the case for CPU compiler only when strict IEEE-754 compliance is selected, which is not always the default.