What is L2/L1/DRAM throughput here?



Like in L2, we have so many memory traffic, do we add all these throughput together for SpeedOfLight L2 cache throughput here?

Like, if L2 cache peak is 10TB/s, so we have (462.66+19.58+1.42e3+816.88+509.13+19.58)/10e3GB/s???

The lts__throughput and dram__throughput metrics do not calculate aggregated bandwidth as input/output bandwidth is only one of the submetrics of the unit throughput.

The L2 does not have symmetrical read/write bandwidth on many GPUs so aggregating the value does not necessarily make sense.

The best approach is to use NCU to enumerate the lts__throughput and dram__throughput sub metrics.

The GPU Speed of Light Througput section under GPU Throughput Breakdown (drop-down) will show the L2 and DRAM breakdown under the Memory Throughput Breakdown. For L2 the throughput can be limited by numerous metrics such as

  • Data Sectors Fill Device
  • Data Sectors Fill Sysmem
  • Tag Sectors
  • L2 to XBAR (return bandwidth)
  • XBAR to L2 (request and write bandwidth)
  • Requests
  • Compression
  • Atomic Unit

None of these are in bytes/sec. The list above may vary per architecture.

To collect via NCU you can specify --metrics breakdown:lts__throughput.avg.pct_of_peak_sustained_elapsed,breakdown:dram__throughput.avg.pct_of_peak_sustained_elapsed

For DRAM the throughput is the read + write bandwidth.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.