Inside my kernel function (declared global) I executed the following code:
if ( blockIdx.x * blockDim.x + threadIdx.x == 0 ) for ( int k = 0; k < 9; k++ ) cuPrintf("%d,",k++);
I get an output:
Can anyone speculate where did 1,3,5,7 go??
I used the cuPrintf from the cuda sdk, with the standard init and printing.
When I used this specific code in one application it ran without problems. In another it gave these strange results.
I have only one gpu, how can I debug the debugging tool?!