accuracy of trigonometry functions

Has anyone looked at the accuracy of the trigonometry functions on the GPU from cuda code? I seem to be getting some incredibly bad results and I don’t know of a way to improve it.

an example would be atan(4.0/5.0)
the cpu tells me the result is 0.67474094222355274
but the gpu says the answer is -4.8366978272229995e-026

I know the cpu is correct because my graphing calculator and Mathematica both agree with it.

does anyone know of a workaround?

I assume this is double precision code (based on the number of decimal places carried in the result). atan(0.8) returns 6.7474094222355263e-001 for me (this is 0.532 ulps off the mathematical result). If this is double-precision code, make sure to build with -arch={sm_13|sm_20|sm_21}.

The maximum error found for each CUDA math library functions during extensive testing is recorded in appendix C of the Programming Guide.

Thanks for the reply. Yes this is with double precision but my experience has been that single precision is just as bad. As for the arch I am using sm_20 which is the highest support by by EVGA GTX 470.

I noticed that depending on how you write your kenel the compile may be able to pre-computer the the result of the atan call and then you will get the correct answer. You should avoid having atan(0.8) directly in the kernel.

another one that doesn’t work for me is atan(5.0/4.0)

the cpu gives me 0.89605538457134393

the gpu gives me -4.8366978272229995e-026

Would it be possible to post a self-contained repro case?

Thanks for help, I think I have a memory corruption issue because I was unable to create a small program to reproduce it.