A strange bug with threadIdx.x

EzPizzy · July 23, 2022, 2:53pm

There is a caculation in my project need the use of threadIdx.x.
For example, like the code below.

#include <stdio.h>
#include <cuda_runtime.h>

__global__ void test()
{
    printf("thread[%d]:%f\n", threadIdx.x, sin((threadIdx.x - 90) * 3.14159265 / 180));
    printf("thread[%d]:%f\n", threadIdx.x, sin((threadIdx.x + 90) * 3.14159265 / 180));
}

int main()
{
    test<<<1, 1>>>();

    return 0;
}

So theoretically, the upper and lower results should be the opposite of each other, -1 and +1.
But in my jetson and laptop, it all return strange answer

thread[0]:0.324045
thread[0]:1.000000

And it seems the bug only occurs when the result of subtracting threadIdx.x from 90 is a negative value,
it will return right answer when it’s positive result.

So is this a common bug of compiler right?
Must I use another variable to get the value of threadIdx.x, blockIdx.x and etc before using it?
And is there any other important precautions I should know?

Robert_Crovella · July 23, 2022, 3:18pm

You’ve been tripped up by unsigned arithmetic.

threadIdx.x is an unsigned quantity.

You may wish to study what happens when you do something like:

threadIdx.x - 90

or

unsigned val = 0;
long long res = val - 90;

in unsigned arithmetic. the quantity res will not contain -90! (Try it in plain host code.)

If you want signed arithmetic, you could first convert threadIdx.x to a signed quantity, e.g.

((int)threadIdx.x) - 90

Of course, threadIdx.x is not available in host code, but you can reproduce your observation in host code by using an unsigned variable, setting it to zero, and then trying your code. So this is a function of C++ behavior, not anything unique or specific to CUDA. And it is not a compiler bug or anything like that.

Aside: When using this kind of kernel-launch coding pattern, I usually encourage people to put a cudaDeviceSynchronize() after the kernel call. It’s not related to your question, and I realize it may appear to be unnecessary on some platforms, perhaps.

njuffa · July 23, 2022, 7:29pm

Side remark: Instead of using sin (PI * [expression]) it is advantageous to use sinpi ([expression]) from both a performance and accuracy perspective.

EzPizzy · July 24, 2022, 7:38am

Thank you so much for your detailed explanation.
I really didn’t think of it as unsigned numbers, but assumed it was just a normal int type.
And I did use cudaSync func in my actual project code, thanks for your reminder.

EzPizzy · July 24, 2022, 7:40am

Do you mean sinpi() is a official func?
I just cant find it in math.h

Robert_Crovella · July 24, 2022, 12:13pm

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__DOUBLE.html#group__CUDA__MATH__DOUBLE_1g06ae86e791c45c081184e605f984e733

EzPizzy · July 24, 2022, 12:23pm

Thank you

njuffa · July 24, 2022, 8:08pm

CUDA has included sinpi(), cospi(), and sincospi() and their single-precision counterparts (with f suffix) for about a decade now. When the IEEE-754 floating-point standard was overhauled in 2008, this functionality was included as recommended. These are clearly very useful functions that avoid a bunch of fairly common numerical problems (example).

If I understand the ISO standards making process correctly (by no means a certainty, as I am not connected in any way to the committee), sinpi() should make an official appearance in a future C++ standard “real soon now”. Temporal uncertainty stems from the fact that the C++ standard math library usually tracks developments in the C standard math library, but does so with unpredictable delay.

Various toolchains and libraries (e.g. Boost), as well as programming languages outside the C/C++ universe, already support sinpi() and cospi().

EzPizzy · July 25, 2022, 8:21am

Wow! Thanks for your detailed science of cuda c/c++ history.
I learn much from this ^_^.

system · August 8, 2022, 8:21am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
threadIdx undeclared identifier CUDA Programming and Performance	7	20063	November 9, 2009
threadIdx undeclared - Compile Problem NVCC NVCC is not reconizing the builtin blockIdx, blockDim, n CUDA Programming and Performance	3	8442	September 24, 2009
How can I test to see the usefullness of `__syncthreads()`? CUDA Programming and Performance	2	306	August 12, 2023
why result varied based on different number of threads per block? CUDA Programming and Performance	8	1940	March 1, 2011
bug? threadIdx.x == -1? CUDA Programming and Performance	3	5787	April 24, 2011
simple program does not work - why? CUDA Programming and Performance	7	3398	June 13, 2008
Cuda makes my pc crazy CUDA Programming and Performance	10	7386	September 16, 2010
threadIdx.y question CUDA Programming and Performance	7	6627	September 21, 2015
Odd error fixed by commenting unrelated line? CUDA Programming and Performance	11	8620	February 17, 2010
Inconsistent behavior from CUDA CUDA Programming and Performance	5	1429	June 23, 2009

A strange bug with threadIdx.x

Related topics