Hello!
We are currently using the CUDA Math Library to experiment with the numerical stability of its Math APIs. To verify correctness, we compare CUDA Math APIs with the corresponding C programming math functions. We have encountered some issues, particularly with undefined behaviors (results producing NaN outputs) , where the C versions identify the Floating point exception, but the CUDA versions output NaN values. We use the similar template to test the APIs as in Possible Rounding/Precision Errors in CUDA Math APIs? - #4 by shemalrathnasuriya
We understand that CUDA lacks floating-point exception detection mechanisms. However, we want to verify if these are actually undefined errors. Below are the functions and the inputs we used to identify the errors:
- acos: Input: 16384.0; C result: FP exception; CUDA: NaN
- acosh: input: 0.5; C result: FP exception; CUDA: NaN
- asin: input: -255.9999237060547; C result: FP exception; CUDA: NaN
- atanh: input: 131072.0; C result: FP exception; CUDA: NaN
- log: input: -0.0015258790226653218; C result: FP exception; CUDA: NaN
- log10: input: -4396717056.0; C result: FP exception; CUDA: NaN
- log1p: input: -268440576.0; C result: FP exception; CUDA: NaN
- log2: input: -0.003768382128328085; C result: FP exception; CUDA: NaN
- sqrt: input: -0.5020751953125; C result: FP exception; CUDA: NaN
- tgamma: input: -1048584.0; C result: FP exception; CUDA: NaN
- y0f: input: -1.1641576591614466e-10 C result: FP exception; CUDA: NaN
- y1: input:-4398046248960.0 C result: FP exception; CUDA: NaN
Not sure what is meant here by “undefined errors”.
Please check the CUDA Math APIs documentation.
Corner case behaviors are specified individually for each function.
https://docs.nvidia.com/cuda/cuda-math-api/index.html
what we defined as undefined behavior are “NaN” results. We found in few cases that the Math APIs yield to produce NaNs where C version detected the FP exception.
The C version is throwing an exception because you enabled that functionality specifically in your code. Disable that, rerun, and see what numerical result you get from the C library in that case. That is what is happening on the CUDA side (no exceptions) so that is the only reasonable basis for comparison.
It’s not sensible to ask why are they different when you have enabled exceptions.
Here is what I see when I do that with one of your examples:
# cat t244.cu
#include <cstdio>
#include <cmath>
__host__ __device__
void comp(){
float val = 16384.0f;
printf("acosf: %.15f\n", acosf(val));
}
__global__ void k(){
comp();
}
int main(){
printf("CPU:\n");
comp();
printf("GPU:\n");
k<<<1,1>>>();
cudaDeviceSynchronize();
}
# nvcc -o t244 t244.cu
# ./t244
CPU:
acosf: nan
GPU:
acosf: nan
#
You can check the other cases in a similar fashion if you wish.
Thank you for your reply. We understand that CUDA does not support exception handling. However, we are trying to design a detector to check such cases automatically. Only way we can verify such cases is by comparing with the C version (i.e., enabling exceptions in C code). We require developer expert and confirmation to better build such automatic bug detectors. These are some instances we triggered from our detector. You can visit it here: Comprehensive-Study-on-GPU-Program-Numerical-Issues.github.io/GPU-NBDetect at main · GPU-Program-Bug-Study/Comprehensive-Study-on-GPU-Program-Numerical-Issues.github.io · GitHub. Our detector is at preliminary stage where it generates inputs through several mutation strategies and detect FP bugs through assertions created by us. Your insight is this regard is highly valuable.
Thank you
Floating-point exceptions in the C language math libraries are meant to provide the user a way to diagnose and handle corner case situations when default response to the input might be unacceptable for the user. For example, consider an ISO C standard and cos(+inf) specification: returns a NaN and raises the ‘‘invalid’’ floating-point exception. User may enable traps on invalid FP exceptions and react to the function’s output based on this specification. In CUDA there’s no exception throwing from the math library, instead the library returns a default result, as specified in its documentation (and an effort has been made to ensure the default response of the CUDA Math library is within the expectation of the C language, modulo the exceptions). This does not constitute an “undefined behavior” in a traditional sense, as the function returns a specified result. Examples of true “undefined behavior” which render CUDA program invalid and potentially cause system-wide side effects through a combination of factors like user error + compiler optimizations are outlined for the CUDA Math functions case by case. One such case is: 9. Integer Mathematical Functions — CUDA Math API Reference Manual 12.6 documentation
That said, expecting floating-point exceptions triggered from the calls to CUDA Math functions, is not going to provide you with additional diagnostic information: exceptions are not supported and this is clearly documented.