H100 Shared Memory Limit Discrepancy

kevin.tong · April 18, 2024, 6:17pm

Hi,

The Hopper tuning docs here NVIDIA Hopper Tuning Guide asserts that H100s have per thread block and per SM shared memory limits of 227 and 228 KB respectively.

However, running deviceQuery from the cuda-samples repo yields device properties as follows:

Total amount of shared memory per block:       49152 bytes
Total shared memory per multiprocessor:        233472 bytes

Hoping to resolve this discrepancy.

Thanks!

rs277 · April 18, 2024, 6:54pm

I think this discrepancy is covered here:

"Kernels relying on shared memory allocations over 48 KB per block are architecture-specific, and must use dynamic shared memory rather than statically sized shared memory arrays. "

system · May 2, 2024, 6:55pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA Device Query says P100 has 48kb shared memory/block... but it's supposed to be 64kb? CUDA Programming and Performance	3	1098	June 4, 2017
Question about max shared memory in block and multiprocessor CUDA Programming and Performance	2	1292	February 20, 2024
Shared memory per block CUDA Programming and Performance	1	6800	March 29, 2012
about maximum amount of shared memory per multiprocessor CUDA Programming and Performance	0	568	June 5, 2013
Shared memory per block Related to shared memory of an MCPU CUDA Programming and Performance	3	3986	August 14, 2007
Why is shared memory configuration size is limiting the occupancy CUDA Programming and Performance kernel , profiling	2	1001	June 4, 2023
NEWBIE:max size of shared memory of a block? CUDA Programming and Performance	3	3097	September 5, 2009
Shared memory allocation CUDA Programming and Performance	1	59	February 20, 2025
Dynamic shared memory calculated by ncu larger than Max_shared_memory_per_block Nsight Compute cuda	4	632	September 21, 2023
Shared memory CUDA Programming and Performance	2	6863	April 14, 2011

H100 Shared Memory Limit Discrepancy

Related topics