How to measure the number of HBM bytes read/written for a kernel?

Is there a way to measure the number of HBM bytes read/written for a kernel? I am running my code on an H100.

Nsight systems has metrics for dram throughput

DRAM Read Bandwidth - dramc__read_throughput.avg.pct_of_peak_sustained_elapsed, dram__read_throughput.avg.pct_of_peak_sustained_elapsed

However, I was hoping to get a counter for the approximate number of bytes read/written from HBM rather than the percentage of the throughput. Perhaps the percentage of the throughput can be integrated to come up with a number, but I am not sure if this is the right thing to do given I assume that any throughput metric itself is averaged based on the sampling frequency. Is there perhaps a better counter (like total_bytes_read or something) or any other technique that you can suggest to measure the number of bytes read/written for a given kernel?

@pkovalenko can you respond to this?

Looks like nsight-compute has a dram__bytes_read.sum metric. That looks like the right thing to track.

As of now, we can’t display these values in bytes in Nsys and NCU in periodic sampling mode. NCU will only show this metric in kernel/range profiling mode.