H100 Shared Memory Limit Discrepancy


The Hopper tuning docs here NVIDIA Hopper Tuning Guide asserts that H100s have per thread block and per SM shared memory limits of 227 and 228 KB respectively.

However, running deviceQuery from the cuda-samples repo yields device properties as follows:

Total amount of shared memory per block:       49152 bytes
Total shared memory per multiprocessor:        233472 bytes

Hoping to resolve this discrepancy.


I think this discrepancy is covered here:

"Kernels relying on shared memory allocations over 48 KB per block are architecture-specific, and must use dynamic shared memory rather than statically sized shared memory arrays. "

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.