UPDATE: There was a prior error in the code, related to the type of variable being used. I thought of deleting the post but i will leave just for the record.
ORIGINAL MESSAGE:
In my quest of trying to fix a bug in my kernel, i came across something very strange in the behavior of the CUDA kernel.
See this fragment of kernel code:
__global__ kernel(...){
...
...
...
dh = ss[sC(threadIdx.x, by, threadIdx.z, sLx, sLy)] * ( ss[sC(threadIdx.x-1,by, threadIdx.z, sLx, sLy)] + ss[sC(threadIdx.x+1, by,threadIdx.z,sLx, sLy)] + ss[sC(threadIdx.x,by-1,threadIdx.z, sLx, sLy)] + ss[sC(threadIdx.x, by+1, threadIdx.z, sLx, sLy)] + ss[sC(threadIdx.x,by, threadIdx.z-1,sLx,sLy)] + ss[sC(threadIdx.x, by, threadIdx.z+1,sLx, sLy)] );
if( dh > 6 ){
printf("thread %i (x = %i by = %i z = %i) dh = %i\n", tid, threadIdx.x, by, threadIdx.z, dh);
}
...
...
...
}
While debugging, i put many of those prints just to see if at least one threads catches the error, which is this case would manyfest for any value of dh > 6.
The weird thing is that the code passes the if statement with a value of dh=0:
thread 264875 (x = 11 by = 2 z = 0) dh = 0
thread 264876 (x = 12 by = 3 z = 0) dh = 0
thread 264877 (x = 13 by = 2 z = 0) dh = 0
thread 264878 (x = 14 by = 3 z = 0) dh = 0
thread 264879 (x = 15 by = 2 z = 0) dh = 0
thread 264880 (x = 16 by = 3 z = 0) dh = 0
thread 264881 (x = 17 by = 2 z = 0) dh = 0
thread 264882 (x = 18 by = 3 z = 0) dh = 0
thread 264883 (x = 19 by = 2 z = 0) dh = 0
thread 264884 (x = 20 by = 3 z = 0) dh = 0
thread 264885 (x = 21 by = 2 z = 0) dh = 0
thread 264886 (x = 22 by = 3 z = 0) dh = 0
thread 264887 (x = 23 by = 2 z = 0) dh = 0
thread 264888 (x = 24 by = 3 z = 0) dh = 0
thread 264889 (x = 25 by = 2 z = 0) dh = 0
thread 264890 (x = 26 by = 3 z = 0) dh = 0
thread 264891 (x = 27 by = 2 z = 0) dh = 0
thread 264892 (x = 28 by = 3 z = 0) dh = 0
thread 264893 (x = 29 by = 2 z = 0) dh = 0
thread 264894 (x = 30 by = 3 z = 0) dh = 0
thread 264895 (x = 31 by = 2 z = 0) dh = 0
...
...