Hello!

We are currently using the CUDA Math Library to experiment with the numerical stability of its Math APIs. To verify correctness, we compare CUDA Math APIs with the corresponding C programming math functions. We have encountered some issues, particularly with rounding errors, where C version and CUDA version results are different.

We understand that CUDA and C have different rounding mechanisms. However, we want to verify if these are actually rounding errors and the root cause for such errors. Below are the functions and the inputs we used to identify the errors:

acos - input: 0.0001590810570633039;

C result: 1.570637**2261047363**;

CUDA result: 1.570637**3453140259**

acosh- input: 2326705117069312.0

C result: 36.07637**7868652344**

CUDA result: 36.07637**405395508**

asinh - input: -4.003921508789062

C result: -2.095663**30909729**

CUDA result: -2.095663**070678711**

atan - input: 191.99949645996094

C result: 1.56558**79974365234**

CUDA result: 1.56558**8116645813**

atanh- input: -0.9530639052391052

C result: -1.864183**783531189**

CUDA result: -1.864183**9027404785**

cbrt - input: -3831.995849609375

C result: -15.64858**2458496094**

CUDA result: -15.64858**341217041**

cosh - input: 0.125

C result: 1.007822**6327896118**

CUDA result: 1.007822**7519989014**

erfc - input: -0.00012207029794808477

C result: 1.000137**6867294312**

CUDA result: 1.000137**8059387207**

exp10- input: 0.007812499534338713

C result: 1.018151**7601013184**

CUDA result: 1.0181**516408920288**

exp2 - input: 0.06152203306555748

C result: 1.043566**107749939**

CUDA result: 1.043566**2269592285**

expm1 - input: 0.9999998211860657

C result: 1.718281**3882827759**

CUDA result: 1.718281**2690734863**

j0 - input: 0.008056640625

C result: 0.999983**7875366211**

CUDA result: 0.999983**9067459106**

j1 - input: -8192.01953125

C result: 0.007864**89900201559**

CUDA result: 0.007864**71646279096**

lgamma- input: 2097664.0

C result: 284366**30.0**

CUDA result: 284366**28.0**

log,- input: 3276800.25

C result: 15.00237**75100708**

CUDA result: 15.00237**8463745117**

log10- input: 25026078.0

C result: 7.39839**2677307129**

CUDA result: 7.39839**3154144287**

log1p- input: 458363.46875

C result: 13.0354**19464111328**

CUDA result: 13.0354**20417785645**

tan - input: 0.9999999403953552

C result: 1.5574074**983596802**

CUDA result: 1.5574**076175689697**

tgamma - input: 0.0390625

C result: 25.06009**1018676758**

CUDA result: 25.06009**292602539**

y0f - input: 0.008666995912790298

C result: -3.096553**087234497**

CUDA result: -3.096553**325653076**

y1 - input: 0.12500381469726562

C result: -5.19978**2848358154**

CUDA result: -5.19978**1894683838**