Like in L2, we have so many memory traffic, do we add all these throughput together for SpeedOfLight L2 cache throughput here?
Like, if L2 cache peak is 10TB/s, so we have (462.66+19.58+1.42e3+816.88+509.13+19.58)/10e3GB/s???
Like, if L2 cache peak is 10TB/s, so we have (462.66+19.58+1.42e3+816.88+509.13+19.58)/10e3GB/s???
The lts__throughput and dram__throughput metrics do not calculate aggregated bandwidth as input/output bandwidth is only one of the submetrics of the unit throughput.
The L2 does not have symmetrical read/write bandwidth on many GPUs so aggregating the value does not necessarily make sense.
The best approach is to use NCU to enumerate the lts__throughput and dram__throughput sub metrics.
The GPU Speed of Light Througput section under GPU Throughput Breakdown (drop-down) will show the L2 and DRAM breakdown under the Memory Throughput Breakdown. For L2 the throughput can be limited by numerous metrics such as
None of these are in bytes/sec. The list above may vary per architecture.
To collect via NCU you can specify --metrics breakdown:lts__throughput.avg.pct_of_peak_sustained_elapsed,breakdown:dram__throughput.avg.pct_of_peak_sustained_elapsed
For DRAM the throughput is the read + write bandwidth.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.