Dynamic shared memory calculated by ncu larger than Max_shared_memory_per_block

zhi_xz · September 6, 2023, 4:03pm

Hi,
I have utilized Nsight Compute to profile an AI model and observed that one of the kernels utilizes 61.4kb shared memory per block, as depicted in the provided figure.

I have inquired about the maximum shared memory a thread block can utilize using the “maxSharedMemLimitPerBlock” API, and the result I received was 49152 bytes. However, I am perplexed as to why the kernel can still utilize shared memory exceeding this maximum limit. Could someone please offer an explanation for this?

zhi_xz · September 7, 2023, 2:38am

I came across a picture which illustrates that a GPU with a capability of 8.7 can support a maximum of 163 kb shared memory per thread block. I’m wondering how I can obtain this value using the CUDA API. Currently, I am using the API cudaDeviceGetAttribute(&sharedMemLimit, cudaDevAttrMaxSharedMemoryPerBlock, 0);, but it only returns 48 kb.
Furthermore, I have an additional question related to the third parameter of the <<<>>> operator, which represents the dynamic shared memory usage. I noticed that if I set this value larger than 48 kb, I receive an error message stating “invalid argument”. However, considering that the maximum shared memory a thread block can actually use is 163 kb, why is a value larger than 48 kb considered an invalid argument?

rs277 · September 7, 2023, 6:03am

The last part of this section may help you.

" Kernels relying on shared memory allocations over 48 KB per block are architecture-specific, as such they must use dynamic shared memory (rather than statically sized arrays) and require an explicit opt-in using cudaFuncSetAttribute() as follows."

zhi_xz · September 7, 2023, 6:12am

Thanks for your response.

system · September 21, 2023, 6:12am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Max shared memory CUDA Programming and Performance	0	1269	July 28, 2020
Shared memory problem of above 48 KB requires dynamic shared memory? CUDA Programming and Performance	4	1879	June 17, 2021
Default value of max dynamic shared memory CUDA Programming and Performance cuda	8	83	December 23, 2024
Question about max shared memory in block and multiprocessor CUDA Programming and Performance	2	1321	February 20, 2024
Shared memory size per Thread Block CUDA Programming and Performance	2	6831	May 17, 2019
NCU dynamic shared memory display question Nsight Compute	2	494	April 24, 2024
Shared memory CUDA Programming and Performance	2	6864	April 14, 2011
shared memory and CUDA calculator CUDA Programming and Performance	6	4041	October 26, 2008
How the Shared Memory Configuration Size is calcuated？ CUDA Programming and Performance cuda , kernel	6	149	January 23, 2025
Shared memory limits and cudaError_enum How to precisely determine how much of the shared memory is CUDA Programming and Performance	5	2811	April 29, 2009

Dynamic shared memory calculated by ncu larger than Max_shared_memory_per_block

Related topics