Hi all, I’m having trouble getting indexing to work for my program
lets say I called my kernel like this:
//host code
sharedMemSize = 1332
dim3 dimGrid(16,16);
dim3 dimBlock(18,18);
dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>
and I’m trying to write the first 16x16 threads (out of 18x18) in each block into the matrix I want to output, and I index like this:
//kernel code
if(threadIdx.x < blockDim.x-kernel->width+1 && threadIdx.y < blockDim.y-kernel->width+1)
dest->d_arrayPtr[
(threadIdx.x)+blockIdx.x*(blockDim.x-kernel->width+1)
+((threadIdx.y)+blockIdx.y*(blockDim.y-kernel->height+1))*gridDim.x*(blockDim.x-kernel->width+1)
] = values;
where kernel->width = 3
lets say values = 5
I would expect the array to be filled with 5’s…but it doesn’t!!
random zeros would appear at places (see the attached file).
however, if I call the kernel like this:
sharedMemSize = 1332
dim3 dimGrid(4,4);
dim3 dimBlock(18,18);
dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>
or like this
sharedMemSize = 436
dim3 dimGrid(8,8);
dim3 dimBlock(10,10);
dilateOnGPU<<<dimGrid, dimBlock, sharedMemSize>>>
then the indexing works fine
any ideas on what is going on?
thanks in advance!