Hi,
The Hopper tuning docs here NVIDIA Hopper Tuning Guide asserts that H100s have per thread block and per SM shared memory limits of 227 and 228 KB respectively.
However, running deviceQuery from the cuda-samples repo yields device properties as follows:
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 233472 bytes
Hoping to resolve this discrepancy.
Thanks!