how to create arrays in runtime in shared memory?


I have the task of large number of threads running, each doing a small matrix multiplication. All the small matrices have been loaded to the global memory. I wish to improve performance by letting each thread load its small matrices into shared memory, and then compute the product. But the problem is that I do not know the sizes of the matrices during compile time. So I cannot create variables as in shared double mat1[xsize][ysize]. On PC, I would have made a dynamic allocation. But I do not know if I could do it on the shared memory. If calling malloc in a kernel would allocate only in global memory, that does not help either.

Is there a way to declare arrays during runtime in kernel? Is there any other way to resolve this problem?

Thank you,

You can pass the desired size upon kernel invocation, see section B.16 Execution configuration in the programming guide.

If I remember correctly you do something like ex:

extern __shared__ float smemData[] ;

__global__ void youtKernel( --- )


smemData[threadIdx.x] = globalPtr[ threadIdx.x + blockIdx.x*blockDim.x];



// invocation

yourKernel<<< gridDim, blockDim, smemSize >>>(---- );

Overall the documentation seems a bit scarce on the subject…

This method allows allocation of the same amount of memory to each of the thread dynamically. I have to populate each thread with differently sized matices, sizes whose upper and lower bounds I do not know yet.

But thank you very much for the reply and the reference. It is a good starting point.

You probably want to do a tiled matrix multiplication for optimal use of shared memory. In that case, you can use a constant tile size even if the matrix sizes are all different.

You also probably want to assign (at least) a block per matrix, not a single thread.