Is there a way to measure the number of HBM bytes read/written for a kernel? I am running my code on an H100.
Nsight systems has metrics for dram throughput
DRAM Read Bandwidth - dramc__read_throughput.avg.pct_of_peak_sustained_elapsed
, dram__read_throughput.avg.pct_of_peak_sustained_elapsed
However, I was hoping to get a counter for the approximate number of bytes read/written from HBM rather than the percentage of the throughput. Perhaps the percentage of the throughput can be integrated to come up with a number, but I am not sure if this is the right thing to do given I assume that any throughput metric itself is averaged based on the sampling frequency. Is there perhaps a better counter (like total_bytes_read or something) or any other technique that you can suggest to measure the number of bytes read/written for a given kernel?