What is the meaning of sm__pipe_fp64_cycles_active[burst/sustained]

Vasilyev_v · February 3, 2022, 11:41am

Hello!

I’ve read the documentation about burst/sustained metrics. But, unfortunately, I didn’t understand the meaning the burst and sustained.
In particular, in relation to the output (AI100):

sm__pipe_fp64_cycles_active.avg.peak_burst                                                                          4
sm__pipe_fp64_cycles_active.avg.peak_sustained                                                                      4
...
smsp__inst_executed_pipe_fp64.avg.pct_of_peak_burst_active                           %                          19.95
smsp__inst_executed_pipe_fp64.avg.pct_of_peak_sustained_active                       %                          79.81

What the meaning of sm__pipe_fp64_cycles_active.avg.peak_[burst/sustained]_active?
What the meaning of smsp__inst_executed_pipe_fp64.avg.pct_of_peak_[burst/sustained]_active and why the burst < sustained?

Perhaps, the answer on my first question will help me understand the second one.
Thanks!

Greg · February 7, 2022, 6:55pm

This is documented in the Nsight Compute Kernel Profiling Guide in Section 3.2 Metrics Structure.

Two types of peak rates are available for every counter: burst and sustained. Burst rate is the maximum rate reportable in a single clock cycle. Sustained rate is the maximum rate achievable over an infinitely long measurement period, for “typical” operations. For many counters, burst equals sustained. Since the burst rate cannot be exceeded, percentages of burst rate will always be less than 100%. Percentages of sustained rate can occasionally exceed 100% in edge cases.

For SW development I would recommend only using “sustained” metrics. Nsight Compute should not be collecting and “burst” metrics.

What the meaning of sm__pipe_fp64_cycles_active.avg.peak_[burst/sustained]_active ?

This is the primary throughput metric for the SM FP64 math pipes. The output of this metrics is a ratio between 0-1.

sm__pipe_fp64_cycles_active.avg.peak_sustained_active = sm__pipe_fp64_cycles_active.avg / (sm__pipe_fp64_cycles_active.avg.peak_sustained x sm__cycles_active.avg)

sm__pipe_fp64_cycles_active.avg = # of cycles the SM FP64 units (1 or 4) are active average across all SMs
.peak_sustained_active = # of cycles this could be sustained per cycle

What the meaning of smsp__inst_executed_pipe_fp64.avg.pct_of_peak_[burst/sustained]_active and why the burst < sustained ?

This is the primary throughput metric for the SMSP FP64 math pipes as a % of active cycles. On GV100 and GA100 each SM sub-partition has a FP64 unit. On graphics oriented parts there is 1 FP64 unit per SM often at a significantly lower throughput.

Please do not use the “burst” metrics. The value is not set correctly for all metrics.

Nsight Compute tends to use “active” vs. “elapsed” as it is focused on times when the GPU should be fully used and the user should look at sm__cycles_active.avg.pct_of_peak_sustained_active first to determine if full SMs are idle.

Nsight System GPU metrics collect counters over time and tend to use “elapsed” cycles which result in each SM counter being reduced by the activity of the SM.

Vasilyev_v · February 8, 2022, 1:54pm

@Greg , thank you very much for your detailed explanations!
A quick question :) :
Let’s suppose, the ncu collected the sequence of measures of sm__sass_thread_inst_executed_op_dadd_pred_on.avg (AI100, four units in a SM):

sm__sass_thread_inst_executed_op_dadd_pred_on.avg.peak_burst                inst/cycle                            128
sm__sass_thread_inst_executed_op_dadd_pred_on.avg.peak_sustained            inst/cycle                             32

Let’s simplify the distribution of the metric .avg over the measurement period. Then, can we assume that the distribution of values over cycles can be like this?:

 sm__sass_thread_inst_executed_op_dadd_pred_on.avg    |   cycle
----------------------------------------------------------------------
128                                                   |   N
0                                                     |   N+1
0                                                     |   N+2
0                                                     |   N+3
128                                                   |   N+4
0                                                     |   N+5
0                                                     |   N+6
0                                                     |   N+7
....                                                  |   ...
----------------------------------------------------------------

Thanks in advance!

Greg · February 9, 2022, 10:42pm

If the hardware had SM sub-partition counters for number of predicated true threads issuing a DADD then you would see an increment by 0-32 each on cycle cycle the SM warp scheduler issued a FP64 (DADD, DMUL, DFMA) instruction. For GV100 and GA100 the issue rate is every 4 cycles. For graphics focused parts the FP64 unit is shared between all 4 SM warp schedulers and the issue rate is less than 8 threads/cycle.

Each time a warp scheduler issues a DADD instruction the SMSP counter would increment by 0-32 increment based upon the active mask and guard predicate mask.
Each warp scheduler is independent so the increments would not be aligned across the 4 schedulers.

Given that the counter references is a SASS metrics there would be 100s of cycles between each DADD instruction issue due to the complexity of the assembly code patch to count the number of predicate true threads for that instruction.

system · March 2, 2022, 5:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to figure out the ratio of the number of GPU-cycles fp64 to the total number of cycles? Nsight Compute performance-metrics	2	1128	April 25, 2022
How are pct_of_peak metrics calculated? Nsight Compute	6	491	April 30, 2025
Gpu__cycles_active vs. sm__cycles_active.max Nsight Compute	3	579	February 26, 2024
What exactly does SM Active Cycles mean? Nsight Compute	3	1767	July 30, 2024
Where can i find detail information of all the metrics and concept in the Nsight Compute? CUDA Programming and Performance	2	416	August 31, 2022
Graphics perf counters meaning Nsight Graphics kernel	4	382	September 6, 2024
IPC at device level Nsight Compute	3	750	October 26, 2021
Metric references and description Nsight Compute	7	5358	March 2, 2024
Question for sm__elapsed_cycles_sum Nsight Compute	2	964	March 26, 2020
Where can i find detail information of all the metrics and concept in the Nsight Compute? Nsight Compute	1	456	August 31, 2022

What is the meaning of sm__pipe_fp64_cycles_active[burst/sustained]

Related topics