So theoretically, the upper and lower results should be the opposite of each other, -1 and +1.
But in my jetson and laptop, it all return strange answer
thread[0]:0.324045
thread[0]:1.000000
And it seems the bug only occurs when the result of subtracting threadIdx.x from 90 is a negative value,
it will return right answer when it’s positive result.
So is this a common bug of compiler right?
Must I use another variable to get the value of threadIdx.x, blockIdx.x and etc before using it?
And is there any other important precautions I should know?
You may wish to study what happens when you do something like:
threadIdx.x - 90
or
unsigned val = 0;
long long res = val - 90;
in unsigned arithmetic. the quantity res will not contain -90! (Try it in plain host code.)
If you want signed arithmetic, you could first convert threadIdx.x to a signed quantity, e.g.
((int)threadIdx.x) - 90
Of course, threadIdx.x is not available in host code, but you can reproduce your observation in host code by using an unsigned variable, setting it to zero, and then trying your code. So this is a function of C++ behavior, not anything unique or specific to CUDA. And it is not a compiler bug or anything like that.
Aside: When using this kind of kernel-launch coding pattern, I usually encourage people to put a cudaDeviceSynchronize() after the kernel call. It’s not related to your question, and I realize it may appear to be unnecessary on some platforms, perhaps.
Thank you so much for your detailed explanation.
I really didn’t think of it as unsigned numbers, but assumed it was just a normal int type.
And I did use cudaSync func in my actual project code, thanks for your reminder.
CUDA has included sinpi(), cospi(), and sincospi() and their single-precision counterparts (with f suffix) for about a decade now. When the IEEE-754 floating-point standard was overhauled in 2008, this functionality was included as recommended. These are clearly very useful functions that avoid a bunch of fairly common numerical problems (example).
If I understand the ISO standards making process correctly (by no means a certainty, as I am not connected in any way to the committee), sinpi() should make an official appearance in a future C++ standard “real soon now”. Temporal uncertainty stems from the fact that the C++ standard math library usually tracks developments in the C standard math library, but does so with unpredictable delay.
Various toolchains and libraries (e.g. Boost), as well as programming languages outside the C/C++ universe, already support sinpi() and cospi().