Indexing problems threadIdx.x+blockidx.x+..... = 0 ?!?!?!?!

Hi all, I’m having trouble getting indexing to work for my program

lets say I called my kernel like this:

//host code

sharedMemSize = 1332

dim3 dimGrid(16,16);

dim3 dimBlock(18,18);

dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>

and I’m trying to write the first 16x16 threads (out of 18x18) in each block into the matrix I want to output, and I index like this:

//kernel code

if(threadIdx.x < blockDim.x-kernel->width+1 && threadIdx.y < blockDim.y-kernel->width+1)

        dest->d_arrayPtr[

              (threadIdx.x)+blockIdx.x*(blockDim.x-kernel->width+1)

	      +((threadIdx.y)+blockIdx.y*(blockDim.y-kernel->height+1))*gridDim.x*(blockDim.x-kernel->width+1)

        ] = values;

where kernel->width = 3

lets say values = 5

I would expect the array to be filled with 5’s…but it doesn’t!!

random zeros would appear at places (see the attached file).

however, if I call the kernel like this:

sharedMemSize = 1332

dim3 dimGrid(4,4);

dim3 dimBlock(18,18);

dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>

or like this

sharedMemSize = 436

dim3 dimGrid(8,8);

dim3 dimBlock(10,10);

dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>

then the indexing works fine

any ideas on what is going on?

thanks in advance!

Can you please paste some real code ?
And use the code tags so it’s readable.

Try to debug your program. Looks like conditions and idexes miss first 32 elements of each thread block. You can also use device emulation mode.