how to resolve Invalid __global__ read of size 8 error , I am using 3d array with 3d indexing in ker

Please help me to resolve this error, i am new to cuda programming.
error i am facing is :

Invalid global read of size 8
========= at 0x00000d38 in mom(double[52][52], double[52][52], double[52][52], double[52][52], double[52][52], double[52][52], double[52][52], double[52][52], double[52][52], double[52][52])
========= by thread (0,2,1) in block (0,0,0)
========= Address 0x40107aec8 is out of bounds

I am doing operations on 3d array in cuda kernel with 3d Indexing. I have calculated indexs in following way

 int i = blockDim.x * blockIdx.x + threadIdx.x;
 int j = blockDim.y * blockIdx.y + threadIdx.y;
 int k = blockDim.z * blockIdx.z + threadIdx.z;  and

dim3 dimBlock(4,4,4);
dim3 dimGrid(idivup(n+2,BLKXSIZE), idivup(m+2,BLKYSIZE), idivup(l+2,BLKYSIZE));
where
idivup is
#define idivup(a, b) ( ((a)%(b) != 0) ? (a)/(b)+1 : (a)/(b) )

You are indexing out of bounds.

You can trace the error down to a specific line of code using the technique discussed here:

http://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

After that, you can use in-kernel assert or printf statements to test indexing conditions, so you can identify the exact illegal index.

Or you can dig out the debugger.

Also, it appears that you are generating more threads than is necessary. There is nothing wrong with that, it is a common CUDA practice. However, in that scenario, it’s important to have a thread-check in your kernel code, that prevents threads whose generated indices would be illegal from doing anything. for example:

int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
int k = blockDim.z * blockIdx.z + threadIdx.z; 

if ((i < n) && (j < m) && (k < l)){

  // rest of kernel code

}

I don’t know what the actual indexing limits should be: you should know that already. I’m just guessing that they might be n,m,l

Finally, it’s not clear how you are allocating for the various parameters used by the kernel:

mom(double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*, double[52][52]*)

If you are allocating triple-pointer arrays (double ***) and then attempting to pass those to the kernel, it’s likely that you are making a mistake there. It can be made to work in a relatively simple fashion if you have 2 dimension known at compile time (e.g. [52][52]) which appears to be what you are doing, but it’s impossible to say for sure based on what you have shown.