Confusion about submetrics


I’ve been trying to understand the Metric Entities Section from nsight compute documentation. Specifically this part:

Does this mean for example that dram__bytes.avg is the average number of dram bytes accessed for the entire kernel?

If so then how does dram_bytes.sum.per_second differ from dram_bytes.avg.per_second?

A GPU has multiple physical instances of units. For example, a GPU may have 32 Streaming Multiprocessors (SMs). .avg is the average value for all SM instances. .sum is the total count for all SMs.

The GPU memory subsystem is also physically partitioned into multiple instances. The .avg is useful when determining log balancing via comparison to .max and .min and .avg is useful when determining the unit throughput with .avg.pct_of_peak_sustained_elapsed.

If trying to determining kernel memory throughput use .sum.per_second.


