About the flops in ncu report

You can find the expression used for calculating the “Peak Work” value shown in the single precision roofline chart in “SpeedOfLight_RooflineChart.section”

Peak Work = derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2 * sm__cycles_elapsed.avg.per_second

Where:
derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2 = sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained * 2
(This is a derived metric defined in the same section file. Since a FMA instruction has two floating point operations 1 multiplication and 1 addition the FMA instruction count is multiplied by two.)

Units:

  • Peak Work : FLOP/second
  • sm__cycles_elapsed.avg.per_second : cycles/second
  • sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained : instructions/cycle