I would like to read (BS_X+1)(BS_Y+1) global memory locations by BS_xBS_Y threads moving the contents to the shared memory and I have developed the following code.
int i = threadIdx.x; int j = threadIdx.y; int idx = blockIdx.x*BLOCK_SIZE_X + threadIdx.x; int idy = blockIdx.y*BLOCK_SIZE_Y + threadIdx.y; int index1 = j*BLOCK_SIZE_Y+i; int i1 = (index1)%(BLOCK_SIZE_X+1); int j1 = (index1)/(BLOCK_SIZE_Y+1); int i2 = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)%(BLOCK_SIZE_X+1); int j2 = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)/(BLOCK_SIZE_Y+1); __shared__ double Ezx_h_shared_ext[BLOCK_SIZE_X+1][BLOCK_SIZE_Y+1]; Ezx_h_shared_ext[i1][j1]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j1)*xdim+(blockIdx.x*BLOCK_SIZE_X+i1)]; if ((i2<(BLOCK_SIZE_X+1))&&(j2<(BLOCK_SIZE_Y+1))) Ezx_h_shared_ext[i2][j2]=Ezx_h[(blockIdx.y*BLOCK_SIZE_Y+j2)*xdim+(blockIdx.x*BLOCK_SIZE_X+i2)];
In my understanding, coalescing is the parallel equivalent of consecutive memory reads of sequential processing. How can I detect now if the global memory accesses are coalesced? I remark that there is an index jump from (i1,j1) to (i2,j2). Thanks in advance.