How are pct_of_peak metrics calculated?

Hi,

I would like to know what equation is actually use to calculate a metric like:
dram__bytes_read.sum.pct_of_peak_sustained_elapsed
I would imagine it’s the dram__bytes_read.sum metric divided by the max theoretical HBM bandwidth on the datasheet for the chip I’m using but since bandwidth is per second, and NCU slows down kernel execution time, how would it know what bandwidth was achieved?
I looked through the kernel profiling guide and moused over metrics in ncu-ui but the “raw” metrics I see are still too high level such as the pct_of_peak metrics for shared memory utilization. Is there a way I can see what numbers are used to calculate the peak %?

Thanks!

Every metric that has pct_of_peak_sustained_elapsed also is available as peak_sustained_elapsed, which will give you the theoretical peak, not the one achieved (pct of) in your concrete measurement.

Nsight Compute locks clocks for deterministic multi-pass results. Peaks are adjusted for the actual clock rate. If preferred, you can disable clock control in ncu with --clock-control none and lock them externally, e.g. using nvidia-smi to a different clock rate. Leaving clocks unlocked can result in skewed values if metrics are collected across mutiple replay passes.

Hm I’m a little confused now, peak sustained elapsed is the peak sustained rate during unit elapsed cycles, but you’re saying it’s the theoretical peak? Isn’t the theoretical peak determined by the hardware so it’d be the same value for each kernel?
What I was expecting for pct_of_peak dram bytes read was (bytes/kernel_duration)/theoretical_hbm_bandwidth but it sounds like it’s actually (bytes/kernel_duration)/max_per_cycle((bytes/kernel_duration)) or something?

Yes, peak_sustained_elapsed is basically the theoretical peak, applied to your actual elapsed cycles. peak_sustained_active would be the (sustained) peak for your active cycles (where active <= elapsed). You can also collect/check peak_sustained to get the underlying peak value unrelated to your actual cycles. It can be a bit easier to follow when actually looking at some numbers:

ncu --metrics regex:sm__inst_executed\.avg\..*,sm__cycles_active.max,sm__cycles_elapsed.max ...
    Section: Command line profiler metrics
    ------------------------------------------------------- ----------- ------------
    Metric Name                                             Metric Unit Metric Value
    ------------------------------------------------------- ----------- ------------
    sm__inst_executed.avg.pct_of_peak_sustained_active                %        35.03
    sm__inst_executed.avg.pct_of_peak_sustained_elapsed               %        33.57
    sm__inst_executed.avg.peak_sustained                     inst/cycle            4
    sm__inst_executed.avg.peak_sustained_active                    inst 2,297,660.79
    sm__inst_executed.avg.peak_sustained_active.per_second      inst/ns         8.45
    sm__inst_executed.avg.peak_sustained_elapsed                   inst 2,398,115.26
    sm__inst_executed.avg.peak_sustained_elapsed.per_second     inst/ns         8.82
    sm__inst_executed.avg.per_cycle_active                   inst/cycle         1.40
    sm__inst_executed.avg.per_cycle_elapsed                  inst/cycle         1.34
    sm__inst_executed.avg.per_second                            inst/ns         2.96

    sm__cycles_active.max                                         cycle      597,149
    sm__cycles_elapsed.max                                        cycle      599,534
    ------------------------------------------------------- ----------- ------------

Note how 2,398,115.26 = 599,534 * 4

1 Like

thank you!!

One more question,

TLDR: is .avg.pct_of_peak the same as .sum.pct_of_peak?

I am calculating pct_of_peak_sustained_elapsed manually so it’s easier to do for a group of kernels I’d call an operator. To do so for a metric y of unit x I’m using the following equation:

x_y.avg / (x_y.avg.peak_sustained*x_cycles_elapsed.max)

for example where x = sm and y = inst_executed:

sm__inst_executed.avg / (sm__inst_executed.avg.peak_sustained * sm__cycles_elapsed.max)

I’m comparing this to taking a weighted average of x_y.avg.pct_of_peak_sustained_elapsed weighted by x_cycles_elapsed.max as a sanity check. However I noticed that the pct_of_peak seems to be the same value for .sum.pct_of_peak and .avg.pct_of_peak.

For calculating manually, if I use .sum in the numerator then I either have to change x_y.avg.peak_sustained to x_y.sum.peak_sustained, or change x_cycles_elapsed.max to x_cycles_elapsed.sum. If I change both to sum, it seems to essentially double count for all the units. However, since the avg.pct_of_peak and sum.pct_of_peak seem to have the same values, should I always just calculate it with avg like the way you did in your answer?

Thanks!!!