NaN results

shemalrathnasuriya · August 1, 2024, 4:43pm

Hello!
We are currently using the CUDA Math Library to experiment with the numerical stability of its Math APIs. To verify correctness, we compare CUDA Math APIs with the corresponding C programming math functions. We have encountered some issues, particularly with undefined behaviors (results producing NaN outputs) , where the C versions identify the Floating point exception, but the CUDA versions output NaN values. We use the similar template to test the APIs as in Possible Rounding/Precision Errors in CUDA Math APIs? - #4 by shemalrathnasuriya

We understand that CUDA lacks floating-point exception detection mechanisms. However, we want to verify if these are actually undefined errors. Below are the functions and the inputs we used to identify the errors:

acos: Input: 16384.0; C result: FP exception; CUDA: NaN
acosh: input: 0.5; C result: FP exception; CUDA: NaN
asin: input: -255.9999237060547; C result: FP exception; CUDA: NaN
atanh: input: 131072.0; C result: FP exception; CUDA: NaN
log: input: -0.0015258790226653218; C result: FP exception; CUDA: NaN
log10: input: -4396717056.0; C result: FP exception; CUDA: NaN
log1p: input: -268440576.0; C result: FP exception; CUDA: NaN
log2: input: -0.003768382128328085; C result: FP exception; CUDA: NaN
sqrt: input: -0.5020751953125; C result: FP exception; CUDA: NaN
tgamma: input: -1048584.0; C result: FP exception; CUDA: NaN
y0f: input: -1.1641576591614466e-10 C result: FP exception; CUDA: NaN
y1: input:-4398046248960.0 C result: FP exception; CUDA: NaN

nastafev · August 1, 2024, 8:01pm

Not sure what is meant here by “undefined errors”.
Please check the CUDA Math APIs documentation.
Corner case behaviors are specified individually for each function.
https://docs.nvidia.com/cuda/cuda-math-api/index.html

shemalrathnasuriya · August 1, 2024, 10:05pm

what we defined as undefined behavior are “NaN” results. We found in few cases that the Math APIs yield to produce NaNs where C version detected the FP exception.

Robert_Crovella · August 1, 2024, 10:25pm

The C version is throwing an exception because you enabled that functionality specifically in your code. Disable that, rerun, and see what numerical result you get from the C library in that case. That is what is happening on the CUDA side (no exceptions) so that is the only reasonable basis for comparison.

It’s not sensible to ask why are they different when you have enabled exceptions.

Here is what I see when I do that with one of your examples:

# cat t244.cu
#include <cstdio>
#include <cmath>

__host__ __device__
void comp(){

  float val = 16384.0f;
  printf("acosf: %.15f\n", acosf(val));
}

__global__ void k(){

  comp();
}

int main(){

  printf("CPU:\n");
  comp();
  printf("GPU:\n");
  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
# nvcc -o t244 t244.cu
# ./t244
CPU:
acosf: nan
GPU:
acosf: nan
#

You can check the other cases in a similar fashion if you wish.

shemalrathnasuriya · August 1, 2024, 10:46pm

Thank you for your reply. We understand that CUDA does not support exception handling. However, we are trying to design a detector to check such cases automatically. Only way we can verify such cases is by comparing with the C version (i.e., enabling exceptions in C code). We require developer expert and confirmation to better build such automatic bug detectors. These are some instances we triggered from our detector. You can visit it here: Comprehensive-Study-on-GPU-Program-Numerical-Issues.github.io/GPU-NBDetect at main · GPU-Program-Bug-Study/Comprehensive-Study-on-GPU-Program-Numerical-Issues.github.io · GitHub. Our detector is at preliminary stage where it generates inputs through several mutation strategies and detect FP bugs through assertions created by us. Your insight is this regard is highly valuable.

Thank you

nastafev · August 1, 2024, 11:20pm

Floating-point exceptions in the C language math libraries are meant to provide the user a way to diagnose and handle corner case situations when default response to the input might be unacceptable for the user. For example, consider an ISO C standard and cos(+inf) specification: returns a NaN and raises the ‘‘invalid’’ floating-point exception. User may enable traps on invalid FP exceptions and react to the function’s output based on this specification. In CUDA there’s no exception throwing from the math library, instead the library returns a default result, as specified in its documentation (and an effort has been made to ensure the default response of the CUDA Math library is within the expectation of the C language, modulo the exceptions). This does not constitute an “undefined behavior” in a traditional sense, as the function returns a specified result. Examples of true “undefined behavior” which render CUDA program invalid and potentially cause system-wide side effects through a combination of factors like user error + compiler optimizations are outlined for the CUDA Math functions case by case. One such case is: 9. Integer Mathematical Functions — CUDA Math API Reference Manual 12.6 documentation

That said, expecting floating-point exceptions triggered from the calls to CUDA Math functions, is not going to provide you with additional diagnostic information: exceptions are not supported and this is clearly documented.

Topic		Replies	Views
CUDA Math Libraries- Possible Underflow Exceptions? GPU-Accelerated Libraries math-api	6	162	August 1, 2024
CUDA Math Library- Possible Overflow Exceptions? GPU-Accelerated Libraries math-api	4	163	July 31, 2024
Possible Rounding/Precision Errors in CUDA Math APIs? GPU-Accelerated Libraries math-api	5	321	July 31, 2024
Undefined and NaN results CUDA Programming and Performance	5	26131	February 8, 2011
Detecting NaN inside CUDA Fortran Kernel Legacy PGI Compilers	5	11753	February 2, 2017
Common Runtime Math Function Not Working CUDA Programming and Performance	3	8410	March 30, 2008
Using only ( *,+ and - ) on 32 bit floats under what circumstances will NaNs and INFs be generated? CUDA Programming and Performance	3	1758	March 6, 2016
kernel returns Nan Legacy PGI Compilers	1	2699	April 27, 2015
Does NVCC has built in function to detect NAN CUDA Programming and Performance	1	8892	January 9, 2010
Assigning NaN to a variable in a device function CUDA Programming and Performance	2	1390	July 27, 2022

NaN results

Related topics