Question about max shared memory in block and multiprocessor

Hi, community member,

when i tried to print my device max shared memory available, I find tow property in CUDA, that’s deviceProp.sharedMemPerBlock and deviceProp.sharedMemPerMultiprocessor.

In A800 GPU, the deviceProp.sharedMemPerBlock is 49152 bytes and the deviceProp.sharedMemPerMultiprocessor is 167936 bytes.

I know that a SM can run multiple blocks, my question is even a kernel have one block only, max shared memory available in this block is 49152 bytes?

For example, In A800 I have a kernel function where the number of blocks is equal to the number of sm, that is, 108.
In this case each SM run a block, although the maximum shared memory available of each sm is 167936 bytes, the maximum shared memory available of each block is 49152 bytes actually.
Is that true? Or CUDA will do some optimization when each SM runs a block, so that the maximum shared memory available of each block can be greater than 49152 bytes?

48KB shared memory is the standard amount that is available on all current GPUs.
To be able to use more than 48KB you have to use dynamic shared memory (extern __shared__), and explicitly allow it using
cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem);

The maximum amount of shared memory supported in this way should be listed in deviceProp.sharedMemPerBlockOptin .

The programming guide lists this limit for all architectures in Table 18.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.