Hello!
I’d like to estimate the ratio of the number of GPU (AI100) cycles when FP64:{FADD, DMUL, DFMA} instructions is executed to the total numbers of GPU cycles is active. Something like to The percentage of GPU cycles FP64 instructions are processed. Value range: 0% (bad) to 100% (optimal, that is, every cycle GPU executed an FP64 instruction).
For this, I use the cli:
nv-nsight-cu-cli --metrics regex:sm__.*fp64.*cycles_active.*,sm__cycles_elapsed,sm__cycles_active,sm__cycles_elapsed --target-processes all <MY APP>
From the output I see the total number of cycles GPU is active/elapsed:
sm__cycles_active.avg cycle 1725374.38
sm__cycles_active.max cycle 1729062
sm__cycles_active.min cycle 1721870
sm__cycles_active.sum cycle 186340433
sm__cycles_elapsed.avg cycle 1731864.65
sm__cycles_elapsed.max cycle 1733056
sm__cycles_elapsed.min cycle 1730283
sm__cycles_elapsed.sum cycle 187041382
I think that one can use the .avg
metric for estimating average number of cycles GPU is active (averaged over all SMs). Then, the average number of cycles GPU is active equals to 1725374.38
.
But, I have a trouble to figure out the metric which is the number of cycles GPU execute FP64 operations. I see the likely metric sm__pipe_fp64_cycles_active.avg
with the value 4602208.59
. But, I doesn’t understand, why this value greater than the total number of GPU cycles sm__cycles_active.avg
?
Next, I attached an excerpt of the output:
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
sm__pipe_fp64_cycles_active.avg cycle 4602208.59
sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_active % 66.68
sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_elapsed % 66.43
sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_frame % 66.43
sm__pipe_fp64_cycles_active.avg.pct_of_peak_burst_region % 66.43
sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_active % 66.68
sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_elapsed % 66.43
sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_frame % 66.43
sm__pipe_fp64_cycles_active.avg.pct_of_peak_sustained_region % 66.43
sm__pipe_fp64_cycles_active.avg.peak_burst 4
sm__pipe_fp64_cycles_active.avg.peak_burst_active cycle 6901497.52
sm__pipe_fp64_cycles_active.avg.peak_burst_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.avg.peak_burst_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.avg.peak_burst_region cycle 6927476.11
sm__pipe_fp64_cycles_active.avg.peak_sustained 4
sm__pipe_fp64_cycles_active.avg.peak_sustained_active cycle 6901497.52
sm__pipe_fp64_cycles_active.avg.peak_sustained_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.avg.peak_sustained_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.avg.peak_sustained_region cycle 6927476.11
sm__pipe_fp64_cycles_active.avg.per_cycle_active 2.67
sm__pipe_fp64_cycles_active.avg.per_cycle_elapsed 2.66
sm__pipe_fp64_cycles_active.avg.per_cycle_in_frame 2.66
sm__pipe_fp64_cycles_active.avg.per_cycle_in_region 2.66
sm__pipe_fp64_cycles_active.avg.per_second cycle/nsecond 2.91
sm__pipe_fp64_cycles_active.max cycle 5572440
sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_active % 80.74
sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_elapsed % 80.44
sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_frame % 80.44
sm__pipe_fp64_cycles_active.max.pct_of_peak_burst_region % 80.44
sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_active % 80.74
sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_elapsed % 80.44
sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_frame % 80.44
sm__pipe_fp64_cycles_active.max.pct_of_peak_sustained_region % 80.44
sm__pipe_fp64_cycles_active.max.peak_burst 4
sm__pipe_fp64_cycles_active.max.peak_burst_active cycle 6901497.52
sm__pipe_fp64_cycles_active.max.peak_burst_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.max.peak_burst_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.max.peak_burst_region cycle 6927476.11
sm__pipe_fp64_cycles_active.max.peak_sustained 4
sm__pipe_fp64_cycles_active.max.peak_sustained_active cycle 6901497.52
sm__pipe_fp64_cycles_active.max.peak_sustained_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.max.peak_sustained_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.max.peak_sustained_region cycle 6927476.11
sm__pipe_fp64_cycles_active.max.per_cycle_active 3.23
sm__pipe_fp64_cycles_active.max.per_cycle_elapsed 3.22
sm__pipe_fp64_cycles_active.max.per_cycle_in_frame 3.22
sm__pipe_fp64_cycles_active.max.per_cycle_in_region 3.22
sm__pipe_fp64_cycles_active.max.per_second cycle/nsecond 3.52
sm__pipe_fp64_cycles_active.min cycle 3759904
sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_active % 54.48
sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_elapsed % 54.28
sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_frame % 54.28
sm__pipe_fp64_cycles_active.min.pct_of_peak_burst_region % 54.28
sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_active % 54.48
sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_elapsed % 54.28
sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_frame % 54.28
sm__pipe_fp64_cycles_active.min.pct_of_peak_sustained_region % 54.28
sm__pipe_fp64_cycles_active.min.peak_burst 4
sm__pipe_fp64_cycles_active.min.peak_burst_active cycle 6901497.52
sm__pipe_fp64_cycles_active.min.peak_burst_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.min.peak_burst_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.min.peak_burst_region cycle 6927476.11
sm__pipe_fp64_cycles_active.min.peak_sustained 4
sm__pipe_fp64_cycles_active.min.peak_sustained_active cycle 6901497.52
sm__pipe_fp64_cycles_active.min.peak_sustained_elapsed cycle 6927458.59
sm__pipe_fp64_cycles_active.min.peak_sustained_frame cycle 6927476.11
sm__pipe_fp64_cycles_active.min.peak_sustained_region cycle 6927476.11
sm__pipe_fp64_cycles_active.min.per_cycle_active 2.18
sm__pipe_fp64_cycles_active.min.per_cycle_elapsed 2.17
sm__pipe_fp64_cycles_active.min.per_cycle_in_frame 2.17
sm__pipe_fp64_cycles_active.min.per_cycle_in_region 2.17
sm__pipe_fp64_cycles_active.min.per_second cycle/nsecond 2.38
sm__pipe_fp64_cycles_active.sum cycle 497038528
sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_active % 66.68
sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_elapsed % 66.43
sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_frame % 66.43
sm__pipe_fp64_cycles_active.sum.pct_of_peak_burst_region % 66.43
sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_active % 66.68
sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_elapsed % 66.43
sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_frame % 66.43
sm__pipe_fp64_cycles_active.sum.pct_of_peak_sustained_region % 66.43
sm__pipe_fp64_cycles_active.sum.peak_burst 432
sm__pipe_fp64_cycles_active.sum.peak_burst_active cycle 745361732
sm__pipe_fp64_cycles_active.sum.peak_burst_elapsed cycle 748165528
sm__pipe_fp64_cycles_active.sum.peak_burst_frame cycle 748167419.48
sm__pipe_fp64_cycles_active.sum.peak_burst_region cycle 748167419.48
sm__pipe_fp64_cycles_active.sum.peak_sustained 432
sm__pipe_fp64_cycles_active.sum.peak_sustained_active cycle 745361732
sm__pipe_fp64_cycles_active.sum.peak_sustained_elapsed cycle 748165528
sm__pipe_fp64_cycles_active.sum.peak_sustained_frame cycle 748167419.48
sm__pipe_fp64_cycles_active.sum.peak_sustained_region cycle 748167419.48
sm__pipe_fp64_cycles_active.sum.per_cycle_active 288.08
sm__pipe_fp64_cycles_active.sum.per_cycle_elapsed 287.00
sm__pipe_fp64_cycles_active.sum.per_cycle_in_frame 287.00
sm__pipe_fp64_cycles_active.sum.per_cycle_in_region 287.00
sm__pipe_fp64_cycles_active.sum.per_second cycle/nsecond 314.15
sm__cycles_active.avg cycle 1725374.38
sm__cycles_active.max cycle 1729062
sm__cycles_active.min cycle 1721870
sm__cycles_active.sum cycle 186340433
sm__cycles_elapsed.avg cycle 1731864.65
sm__cycles_elapsed.max cycle 1733056
sm__cycles_elapsed.min cycle 1730283
sm__cycles_elapsed.sum cycle 187041382
---------------------------------------------------------------------- --------------- ------------------------------