There is not enough information in these snippets for a meaningful diagnosis. If you post a buildable self-contained piece of code that reproduces the issue, along with the exact nvcc invocation used to build it, that would enable others to analyze what is going on. What CUDA version are you using? What GPU do you run this code on?
Note that powf() is a function that takes ‘float’ arguments, yet in all the calls to this function in the device code you seem to be passing ‘double’ arguments. The constant 3.14159265359 is also of type double. Is there a particular reason for this mixture of double-precision data with single-precision functions?
Your host code is not identical to your device code because all functions called in the host code are double-precision math functions, while you are using single-precision math functions in the device code. You may simply have an overflow in the device code because of that.
From a performance perspective, pow() and powf() are very expensive functions on any platform (CPU or GPU), you would never want to call them to simply square data as you seem to be doing in this code.
You really want to cut down on the number of powf() calls. See also the Best Practices Guide ([url]CUDA Toolkit Documentation). In particular, you would want to substitute:
powf (x, 2) → x*x
powf (x, 0.5f) → sqrtf(x)
There is a function dedicated to computing sqrt(xx+yy) which you may want to consider:
Note that floating-point constants without a ‘f’ suffix are double precision by default, this makes your computation much more expensive, especially since you are on a consumer GPU with relatively low double-precision operation throughput:
By the way, your equations look very unusual. I would be interested to learn what area of science they are from. If you could point me to a relevant paper, that would be ideal. Thanks!
Jaja, thanks ;),this is a benchmark, I use it 'cause i’m programing mono-objective and multi-objective metaheuristics, when finish my work I would be pleased sharing my thesis.