Trouble to Reach Peak Bandwidth of A100

The advertised number is a peak theoretical number. It cannot be achieved in actual measurement for various reasons. 1.80/2.0 = 90% is a reasonable upper bound on what can be achieved in actual code/measurement. This is common for many GPUs and is not unique/specific to the A100.

I think it would be correct to say that nobody in the history of CUDA has ever demonstrated a kernel that achieves peak theoretical bandwidth.

The theoretical memory bandwidth calculation is as follows:

5 stacks HBM x 1024 bits/stack/transfer x 1593 Mtransfers/sec * 2 (double-pumped) / 8 bits/byte = 2.039 TB/s.

There are 16GB/stack for a total of 80GB.

1 Like