Loading a global memory to shared memory 2d array

My problem seems to be really simple so I apologyze to start.

I have an array in global memory:
I want to load to shared memory of size 16 *16

Is this expression correct and will end with 64 different blocks with my data loaded?

  • d_j = (blockIdx%x-1) * blockDim%x + threadIdx%x-1 d_l = (blockIdx%y-1) * blockDim%y + threadIdx%y-1 tIdx = threadIdx%x -1 tIdy = threadIdx%y -1

    real, shared :: s_array(0:15,0:15)
    s_array(tIdx,tIdy) = g_array(d_j,d_l)