Hello friends, I need some help to understand shared memory declaration.
Let’s say i have very simple code to pass a 4 by 4 matrix from device memory to shared memory:
float matrix;
cudaMalloc((void*)&matrix, 16*sizeof(float));
//this matrix is 4 by 4 matrix
dim3 dimGrid(2,2);
dim3 dimBlock(2,2);
kernel<<<dimGrid,dimBlock>>>(matrix);
cudaFree(matrix);
global void kernel(float * matrix)
{
int row = blockIdx.y blockDim.y + threadIdx.y;
int col = blockIdx.x blockDim.x + threadIdx.x;
shared d_array[32];
d_array[row4 + col] = matrix[row4+col];
}
Questions:
- When I say shared d_array[32], how many d_array are created in my shared memory?
- In shared d_array[32], is this " 32" the size of my array?
- When this kernel is finished, is this matrix still stored in my shared memory so that my next kernel can directly use it?
- Is this way of passing values correct : d_array[row4 + col] = matrix[row4+col]?
Many many thanks!