FP efficiency and utilization

Hi
In the metrics, I see two FP related stats:

flop_sp_efficiency
smsp__sass_thread_inst_executed_ops_fadd_fmul_ffma_pred_on.avg.pct_of_peak_sustained_elapsed

single_precision_fu_utilization
smsp__pipe_fma_cycles_active.avg.pct_of_peak_sustained_active

I understand that the efficiency is calculated by the number of FP operations (fadd+fmul+2ffma) over the time and the utilization is about percentage of cycles that the FP unit is active. However, from the definition, the utilization is reduced to fma pipe. I am not sure if that metric is just focusing on the fma operation (not fadd, fmul and fma) or not.

Moreover, I don’t know if such two metrics are on the same direction or not. I mean, if higher utilization means higher efficiency, then why there exist two metrics? Having two metrics, may imply that it if possible to have high efficiency but low utilization. Since efficiency here includes the number of FP instructions, then we can say the more number of FP instructions executed by the core, both efficiency and utilization should be high.

Any comment on that? If these two metrics are orthogonal, I would like to know more.

Hi,

Following metric indicates the thread-level utilization of FADD, FMUL, FFMA instructions within the FMA pipe. The FMA pipe also executes other instructions, notably integer multiply, that are not counted by this metric.
flop_sp_efficiency
smsp__sass_thread_inst_executed_ops_fadd_fmul_ffma_pred_on.avg.pct_of_peak_sustained_elapsed

The following indicates the warp-level utilization of the FMA pipe:
single_precision_fu_utilization
smsp__pipe_fma_cycles_active.avg.pct_of_peak_sustained_active


Diagram is available in https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

1 Like

Thanks for the info. I now understand the difference.

I also found this good topic