Hello friends, I need some help to understand shared memory declaration.

Let’s say i have very simple code to pass a 4 by 4 matrix from device memory to shared memory:

float *matrix;
cudaMalloc((void**)&matrix, 16*sizeof(float));

//this matrix is 4 by 4 matrix

dim3 dimGrid(2,2);

dim3 dimBlock(2,2);

kernel<<<dimGrid,dimBlock>>>(matrix);

cudaFree(matrix);

**global** void kernel(float * matrix)

{

int row = blockIdx.y *blockDim.y + threadIdx.y;
int col = blockIdx.x blockDim.x + threadIdx.x;
shared d_array[32];
d_array[row4 + col] = matrix[row*4+col];

}

Questions:

- When I say
**shared**d_array[32], how many d_array are created in my shared memory? - In
**shared**d_array[32], is this " 32" the size of my array? - When this kernel is finished, is this matrix still stored in my shared memory so that my next kernel can directly use it?
- Is this way of passing values correct : d_array[row
*4 + col] = matrix[row*4+col]?

Many many thanks!