declare and pass values to shared memory

Hello friends, I need some help to understand shared memory declaration.

Let’s say i have very simple code to pass a 4 by 4 matrix from device memory to shared memory:

float matrix;
cudaMalloc((void
*)&matrix, 16*sizeof(float));
//this matrix is 4 by 4 matrix

dim3 dimGrid(2,2);
dim3 dimBlock(2,2);

kernel<<<dimGrid,dimBlock>>>(matrix);

cudaFree(matrix);

global void kernel(float * matrix)
{
int row = blockIdx.y blockDim.y + threadIdx.y;
int col = blockIdx.x blockDim.x + threadIdx.x;
shared d_array[32];
d_array[row
4 + col] = matrix[row
4+col];
}

Questions:

  1. When I say shared d_array[32], how many d_array are created in my shared memory?
  2. In shared d_array[32], is this " 32" the size of my array?
  3. When this kernel is finished, is this matrix still stored in my shared memory so that my next kernel can directly use it?
  4. Is this way of passing values correct : d_array[row4 + col] = matrix[row4+col]?

Many many thanks!

One per multiprocessor

Yes

No. Shared memory has the life of a single block.

No, the shared memory array is only size 32, so indices have to be in 0…31