For the A100 GPU, the L2 cache is partitioned into 2 parts with some interconnect between the two. I was wondering what is the bandwidth of this interconnect as I could not find it online.
Thanks!
For the A100 GPU, the L2 cache is partitioned into 2 parts with some interconnect between the two. I was wondering what is the bandwidth of this interconnect as I could not find it online.
Thanks!
It might be possible to get some idea of this using some sort of L2 test along with the profiler (nsight compute). A memory chart can be obtained, and this memory chart is color coded to represent percentage of peak. If you found a test that exercised the L2 partition connector up to an orange level and then doubled the reported value, and divided by the kernel duration, you might be able to get an estimate. I don’t know that a value is published anywhere.
IIRC from tests in a paper which reference I do not remember, as I read it a long time ago, it was some sensible value around the overall L2 bandwidth of the other half.
The potential problem with the partitions is not the bandwidth and mostly not the definitely increased latency, but that values are duplicated into both halves lowering the effective L2 size.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.