Moving a (BS_X+1)*(BS_Y+1) global memory matrix by BS_X*BS_Y threads

I would like to read (BS_X+1)(BS_Y+1) global memory locations by BS_xBS_Y threads moving the contents to the shared memory and I have developed the following code.

int i       = threadIdx.x;
int j       = threadIdx.y;
int idx     = blockIdx.x*BLOCK_SIZE_X + threadIdx.x;
int idy     = blockIdx.y*BLOCK_SIZE_Y + threadIdx.y;

int index1  = j*BLOCK_SIZE_Y+i;

int i1      = (index1)%(BLOCK_SIZE_X+1);
int j1      = (index1)/(BLOCK_SIZE_Y+1);

int i2      = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)%(BLOCK_SIZE_X+1);
int j2      = (BLOCK_SIZE_X*BLOCK_SIZE_Y+index1)/(BLOCK_SIZE_Y+1);

__shared__ double Ezx_h_shared_ext[BLOCK_SIZE_X+1][BLOCK_SIZE_Y+1];     


if ((i2<(BLOCK_SIZE_X+1))&&(j2<(BLOCK_SIZE_Y+1))) 

In my understanding, coalescing is the parallel equivalent of consecutive memory reads of sequential processing. How can I detect now if the global memory accesses are coalesced? I remark that there is an index jump from (i1,j1) to (i2,j2). Thanks in advance.