A strange bug with threadIdx.x

There is a caculation in my project need the use of threadIdx.x.
For example, like the code below.

#include <stdio.h>
#include <cuda_runtime.h>

__global__ void test()
{
    printf("thread[%d]:%f\n", threadIdx.x, sin((threadIdx.x - 90) * 3.14159265 / 180));
    printf("thread[%d]:%f\n", threadIdx.x, sin((threadIdx.x + 90) * 3.14159265 / 180));
}

int main()
{
    test<<<1, 1>>>();

    return 0;
}

So theoretically, the upper and lower results should be the opposite of each other, -1 and +1.
But in my jetson and laptop, it all return strange answer

thread[0]:0.324045
thread[0]:1.000000

And it seems the bug only occurs when the result of subtracting threadIdx.x from 90 is a negative value,
it will return right answer when it’s positive result.

So is this a common bug of compiler right?
Must I use another variable to get the value of threadIdx.x, blockIdx.x and etc before using it?
And is there any other important precautions I should know?

You’ve been tripped up by unsigned arithmetic.

threadIdx.x is an unsigned quantity.

You may wish to study what happens when you do something like:

threadIdx.x - 90

or

unsigned val = 0;
long long res = val - 90;

in unsigned arithmetic. the quantity res will not contain -90! (Try it in plain host code.)

If you want signed arithmetic, you could first convert threadIdx.x to a signed quantity, e.g.

((int)threadIdx.x) - 90

Of course, threadIdx.x is not available in host code, but you can reproduce your observation in host code by using an unsigned variable, setting it to zero, and then trying your code. So this is a function of C++ behavior, not anything unique or specific to CUDA. And it is not a compiler bug or anything like that.

Aside: When using this kind of kernel-launch coding pattern, I usually encourage people to put a cudaDeviceSynchronize() after the kernel call. It’s not related to your question, and I realize it may appear to be unnecessary on some platforms, perhaps.

Side remark: Instead of using sin (PI * [expression]) it is advantageous to use sinpi ([expression]) from both a performance and accuracy perspective.

Thank you so much for your detailed explanation.
I really didn’t think of it as unsigned numbers, but assumed it was just a normal int type.
And I did use cudaSync func in my actual project code, thanks for your reminder.

Do you mean sinpi() is a official func?
I just cant find it in math.h

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__DOUBLE.html#group__CUDA__MATH__DOUBLE_1g06ae86e791c45c081184e605f984e733

Thank you

CUDA has included sinpi(), cospi(), and sincospi() and their single-precision counterparts (with f suffix) for about a decade now. When the IEEE-754 floating-point standard was overhauled in 2008, this functionality was included as recommended. These are clearly very useful functions that avoid a bunch of fairly common numerical problems (example).

If I understand the ISO standards making process correctly (by no means a certainty, as I am not connected in any way to the committee), sinpi() should make an official appearance in a future C++ standard “real soon now”. Temporal uncertainty stems from the fact that the C++ standard math library usually tracks developments in the C standard math library, but does so with unpredictable delay.

Various toolchains and libraries (e.g. Boost), as well as programming languages outside the C/C++ universe, already support sinpi() and cospi().

Wow! Thanks for your detailed science of cuda c/c++ history.
I learn much from this ^_^.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.