bug? threadIdx.x == -1?

while (bound > 0){

                tile[threadIdx.y][threadIdx.x] += tile[threadIdx.y][threadIdx.x+bound];   

                printf("bound=%d tile(%d,%d)=%f\n",bound, threadIdx.y, threadIdx.x,tile[threadIdx.y][threadIdx.x]); 

                bound >>= 1;        

        }

The above snippet is taken from a reduction algorithm in a kernel. Under what circumstances would this output:

tile(0,0)=1.920000

tile(0,-1)=3.330000

tile(0,-1)=6.150000

tile(0,-1)=12.780000

tile(0,-1)=25.030000

tile(0,-1)=50.060000

I don’t know the ABI in any detail, but it might help not to rely on undefined behavior. Cast the values to integers before printing them as integers:

printf("bound=%d tile(%d,%d)=%f\n", (int)bound, (int)threadIdx.y, (int)threadIdx.x,tile[threadIdx.y][threadIdx.x]);

Depending on its declaration [font=“Courier New”]bound[/font] might not need a cast, but [font=“Courier New”]threadIdx.x[/font] and [font=“Courier New”]threadIdx.y[/font] aren’t [font=“Courier New”]int[/font]s.

Regardless of the strange printf() results (which I can’t explain either, but as tera says, try casts…) your code is likely not workable.

You’re reading and writing into the same array from multiple threads at once. Depending on your value of bound, you can easily have thread races. There’s a small caveat that this might be OK if both bound <= 32 and threadDim.x<=32, and the warp synchronous behavior would prevent it, but likely what you’re doing is dangerous anyway. You probably mean to have a syncthreads() call after each iteration of your loop.

Hi, thanks for your comment, but they are actually meant to be sync within warp.