How to figure out the ratio of the number of GPU-cycles fp64 to the total number of cycles?

Hello!
I’d like to estimate the ratio of the number of GPU (AI100) cycles when FP64:{FADD, DMUL, DFMA} instructions is executed to the total numbers of GPU cycles is active. Something like to The percentage of GPU cycles FP64 instructions are processed. Value range: 0% (bad) to 100% (optimal, that is, every cycle GPU executed an FP64 instruction).

For this, I use the cli:

nv-nsight-cu-cli --metrics regex:sm__.*fp64.*cycles_active.*,sm__cycles_elapsed,sm__cycles_active,sm__cycles_elapsed --target-processes all <MY APP>

From the output I see the total number of cycles GPU is active/elapsed:

    sm__cycles_active.avg                                                            cycle                     1725374.38
    sm__cycles_active.max                                                            cycle                        1729062
    sm__cycles_active.min                                                            cycle                        1721870
    sm__cycles_active.sum                                                            cycle                      186340433
    sm__cycles_elapsed.avg                                                           cycle                     1731864.65
    sm__cycles_elapsed.max                                                           cycle                        1733056
    sm__cycles_elapsed.min                                                           cycle                        1730283
    sm__cycles_elapsed.sum                                                           cycle                      187041382

I think that one can use the .avg metric for estimating average number of cycles GPU is active (averaged over all SMs). Then, the average number of cycles GPU is active equals to 1725374.38.

But, I have a trouble to figure out the metric which is the number of cycles GPU execute FP64 operations. I see the likely metric sm__pipe_fp64_cycles_active.avg with the value 4602208.59. But, I doesn’t understand, why this value greater than the total number of GPU cycles sm__cycles_active.avg?

Next, I attached an excerpt of the output:

Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    sm__pipe_fp64_cycles_active.avg                                                  cycle                     4602208.59
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_active                             %                          66.68
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_elapsed                            %                          66.43
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_frame                              %                          66.43
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_region                             %                          66.43
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_active                         %                          66.68
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_elapsed                        %                          66.43
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_frame                          %                          66.43
    sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_region                         %                          66.43
    sm__pipe_fp64_cycles_active.avg.peak_burst                                                                          4
    sm__pipe_fp64_cycles_active.avg.peak_burst_active                                cycle                     6901497.52
    sm__pipe_fp64_cycles_active.avg.peak_burst_elapsed                               cycle                     6927458.59
    sm__pipe_fp64_cycles_active.avg.peak_burst_frame                                 cycle                     6927476.11
    sm__pipe_fp64_cycles_active.avg.peak_burst_region                                cycle                     6927476.11
    sm__pipe_fp64_cycles_active.avg.peak_sustained                                                                      4
    sm__pipe_fp64_cycles_active.avg.peak_sustained_active                            cycle                     6901497.52
    sm__pipe_fp64_cycles_active.avg.peak_sustained_elapsed                           cycle                     6927458.59
    sm__pipe_fp64_cycles_active.avg.peak_sustained_frame                             cycle                     6927476.11
    sm__pipe_fp64_cycles_active.avg.peak_sustained_region                            cycle                     6927476.11
    sm__pipe_fp64_cycles_active.avg.per_cycle_active                                                                 2.67
    sm__pipe_fp64_cycles_active.avg.per_cycle_elapsed                                                                2.66
    sm__pipe_fp64_cycles_active.avg.per_cycle_in_frame                                                               2.66
    sm__pipe_fp64_cycles_active.avg.per_cycle_in_region                                                              2.66
    sm__pipe_fp64_cycles_active.avg.per_second                               cycle/nsecond                           2.91
    sm__pipe_fp64_cycles_active.max                                                  cycle                        5572440
    sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_active                             %                          80.74
    sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_elapsed                            %                          80.44
    sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_frame                              %                          80.44
    sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_region                             %                          80.44
    sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_active                         %                          80.74
    sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_elapsed                        %                          80.44
    sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_frame                          %                          80.44
    sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_region                         %                          80.44
    sm__pipe_fp64_cycles_active.max.peak_burst                                                                          4
    sm__pipe_fp64_cycles_active.max.peak_burst_active                                cycle                     6901497.52
    sm__pipe_fp64_cycles_active.max.peak_burst_elapsed                               cycle                     6927458.59
    sm__pipe_fp64_cycles_active.max.peak_burst_frame                                 cycle                     6927476.11
    sm__pipe_fp64_cycles_active.max.peak_burst_region                                cycle                     6927476.11
    sm__pipe_fp64_cycles_active.max.peak_sustained                                                                      4
    sm__pipe_fp64_cycles_active.max.peak_sustained_active                            cycle                     6901497.52
    sm__pipe_fp64_cycles_active.max.peak_sustained_elapsed                           cycle                     6927458.59
    sm__pipe_fp64_cycles_active.max.peak_sustained_frame                             cycle                     6927476.11
    sm__pipe_fp64_cycles_active.max.peak_sustained_region                            cycle                     6927476.11
    sm__pipe_fp64_cycles_active.max.per_cycle_active                                                                 3.23
    sm__pipe_fp64_cycles_active.max.per_cycle_elapsed                                                                3.22
    sm__pipe_fp64_cycles_active.max.per_cycle_in_frame                                                               3.22
    sm__pipe_fp64_cycles_active.max.per_cycle_in_region                                                              3.22
    sm__pipe_fp64_cycles_active.max.per_second                               cycle/nsecond                           3.52
    sm__pipe_fp64_cycles_active.min                                                  cycle                        3759904
    sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_active                             %                          54.48
    sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_elapsed                            %                          54.28
    sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_frame                              %                          54.28
    sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_region                             %                          54.28
    sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_active                         %                          54.48
    sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_elapsed                        %                          54.28
    sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_frame                          %                          54.28
    sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_region                         %                          54.28
    sm__pipe_fp64_cycles_active.min.peak_burst                                                                          4
    sm__pipe_fp64_cycles_active.min.peak_burst_active                                cycle                     6901497.52
    sm__pipe_fp64_cycles_active.min.peak_burst_elapsed                               cycle                     6927458.59
    sm__pipe_fp64_cycles_active.min.peak_burst_frame                                 cycle                     6927476.11
    sm__pipe_fp64_cycles_active.min.peak_burst_region                                cycle                     6927476.11
    sm__pipe_fp64_cycles_active.min.peak_sustained                                                                      4
    sm__pipe_fp64_cycles_active.min.peak_sustained_active                            cycle                     6901497.52
    sm__pipe_fp64_cycles_active.min.peak_sustained_elapsed                           cycle                     6927458.59
    sm__pipe_fp64_cycles_active.min.peak_sustained_frame                             cycle                     6927476.11
    sm__pipe_fp64_cycles_active.min.peak_sustained_region                            cycle                     6927476.11
    sm__pipe_fp64_cycles_active.min.per_cycle_active                                                                 2.18
    sm__pipe_fp64_cycles_active.min.per_cycle_elapsed                                                                2.17
    sm__pipe_fp64_cycles_active.min.per_cycle_in_frame                                                               2.17
    sm__pipe_fp64_cycles_active.min.per_cycle_in_region                                                              2.17
    sm__pipe_fp64_cycles_active.min.per_second                               cycle/nsecond                           2.38
    sm__pipe_fp64_cycles_active.sum                                                  cycle                      497038528
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_active                             %                          66.68
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_elapsed                            %                          66.43
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_frame                              %                          66.43
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_region                             %                          66.43
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_active                         %                          66.68
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_elapsed                        %                          66.43
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_frame                          %                          66.43
    sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_region                         %                          66.43
    sm__pipe_fp64_cycles_active.sum.peak_burst                                                                        432
    sm__pipe_fp64_cycles_active.sum.peak_burst_active                                cycle                      745361732
    sm__pipe_fp64_cycles_active.sum.peak_burst_elapsed                               cycle                      748165528
    sm__pipe_fp64_cycles_active.sum.peak_burst_frame                                 cycle                   748167419.48
    sm__pipe_fp64_cycles_active.sum.peak_burst_region                                cycle                   748167419.48
    sm__pipe_fp64_cycles_active.sum.peak_sustained                                                                    432
    sm__pipe_fp64_cycles_active.sum.peak_sustained_active                            cycle                      745361732
    sm__pipe_fp64_cycles_active.sum.peak_sustained_elapsed                           cycle                      748165528
    sm__pipe_fp64_cycles_active.sum.peak_sustained_frame                             cycle                   748167419.48
    sm__pipe_fp64_cycles_active.sum.peak_sustained_region                            cycle                   748167419.48
    sm__pipe_fp64_cycles_active.sum.per_cycle_active                                                               288.08
    sm__pipe_fp64_cycles_active.sum.per_cycle_elapsed                                                              287.00
    sm__pipe_fp64_cycles_active.sum.per_cycle_in_frame                                                             287.00
    sm__pipe_fp64_cycles_active.sum.per_cycle_in_region                                                            287.00
    sm__pipe_fp64_cycles_active.sum.per_second                               cycle/nsecond                         314.15
    sm__cycles_active.avg                                                            cycle                     1725374.38
    sm__cycles_active.max                                                            cycle                        1729062
    sm__cycles_active.min                                                            cycle                        1721870
    sm__cycles_active.sum                                                            cycle                      186340433
    sm__cycles_elapsed.avg                                                           cycle                     1731864.65
    sm__cycles_elapsed.max                                                           cycle                        1733056
    sm__cycles_elapsed.min                                                           cycle                        1730283
    sm__cycles_elapsed.sum                                                           cycle                      187041382
    ---------------------------------------------------------------------- --------------- ------------------------------

In your output the metric sm__pipe_fp64_cycles_active.avg.peak_sustained = 4. An A100 has a FP64 unit per SM sub-partition (SMSP) so sm__pipe_fp64_cycles_active.avg can increment from 0-sm__pipe_fp64_cycles_active.avg.peak_sustained per cycle.

If you want the % of active cycles that FP64 units are active then collect sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_active.