Hi,
I would like to know what equation is actually use to calculate a metric like:
dram__bytes_read.sum.pct_of_peak_sustained_elapsed
I would imagine it’s the dram__bytes_read.sum
metric divided by the max theoretical HBM bandwidth on the datasheet for the chip I’m using but since bandwidth is per second, and NCU slows down kernel execution time, how would it know what bandwidth was achieved?
I looked through the kernel profiling guide and moused over metrics in ncu-ui but the “raw” metrics I see are still too high level such as the pct_of_peak metrics for shared memory utilization. Is there a way I can see what numbers are used to calculate the peak %?
Thanks!
Every metric that has pct_of_peak_sustained_elapsed
also is available as peak_sustained_elapsed
, which will give you the theoretical peak, not the one achieved (pct of) in your concrete measurement.
Nsight Compute locks clocks for deterministic multi-pass results. Peaks are adjusted for the actual clock rate. If preferred, you can disable clock control in ncu with --clock-control none
and lock them externally, e.g. using nvidia-smi
to a different clock rate. Leaving clocks unlocked can result in skewed values if metrics are collected across mutiple replay passes.
Hm I’m a little confused now, peak sustained elapsed is the peak sustained rate during unit elapsed cycles
, but you’re saying it’s the theoretical peak? Isn’t the theoretical peak determined by the hardware so it’d be the same value for each kernel?
What I was expecting for pct_of_peak dram bytes read was (bytes/kernel_duration)/theoretical_hbm_bandwidth
but it sounds like it’s actually (bytes/kernel_duration)/max_per_cycle((bytes/kernel_duration))
or something?
Yes, peak_sustained_elapsed
is basically the theoretical peak, applied to your actual elapsed cycles. peak_sustained_active
would be the (sustained) peak for your active cycles (where active <= elapsed). You can also collect/check peak_sustained
to get the underlying peak value unrelated to your actual cycles. It can be a bit easier to follow when actually looking at some numbers:
ncu --metrics regex:sm__inst_executed\.avg\..*,sm__cycles_active.max,sm__cycles_elapsed.max ...
Section: Command line profiler metrics
------------------------------------------------------- ----------- ------------
Metric Name Metric Unit Metric Value
------------------------------------------------------- ----------- ------------
sm__inst_executed.avg.pct_of_peak_sustained_active % 35.03
sm__inst_executed.avg.pct_of_peak_sustained_elapsed % 33.57
sm__inst_executed.avg.peak_sustained inst/cycle 4
sm__inst_executed.avg.peak_sustained_active inst 2,297,660.79
sm__inst_executed.avg.peak_sustained_active.per_second inst/ns 8.45
sm__inst_executed.avg.peak_sustained_elapsed inst 2,398,115.26
sm__inst_executed.avg.peak_sustained_elapsed.per_second inst/ns 8.82
sm__inst_executed.avg.per_cycle_active inst/cycle 1.40
sm__inst_executed.avg.per_cycle_elapsed inst/cycle 1.34
sm__inst_executed.avg.per_second inst/ns 2.96
sm__cycles_active.max cycle 597,149
sm__cycles_elapsed.max cycle 599,534
------------------------------------------------------- ----------- ------------
Note how 2,398,115.26 = 599,534 * 4
1 Like
One more question,
TLDR: is .avg.pct_of_peak the same as .sum.pct_of_peak?
I am calculating pct_of_peak_sustained_elapsed manually so it’s easier to do for a group of kernels I’d call an operator. To do so for a metric y of unit x I’m using the following equation:
x_y.avg / (x_y.avg.peak_sustained*x_cycles_elapsed.max)
for example where x = sm
and y = inst_executed
:
sm__inst_executed.avg / (sm__inst_executed.avg.peak_sustained * sm__cycles_elapsed.max)
I’m comparing this to taking a weighted average of x_y.avg.pct_of_peak_sustained_elapsed
weighted by x_cycles_elapsed.max
as a sanity check. However I noticed that the pct_of_peak seems to be the same value for .sum.pct_of_peak and .avg.pct_of_peak.
For calculating manually, if I use .sum in the numerator then I either have to change x_y.avg.peak_sustained to x_y.sum.peak_sustained, or change x_cycles_elapsed.max to x_cycles_elapsed.sum. If I change both to sum, it seems to essentially double count for all the units. However, since the avg.pct_of_peak and sum.pct_of_peak seem to have the same values, should I always just calculate it with avg like the way you did in your answer?
Thanks!!!