Regarding the performance bound represented by nvprof https://pasteboard.co/ITYXP2C.png
I can verify some metrics that are shown in the chart. However, some others are not clear.
Control-flow operations => cf_fu_utilization
Arithmetic operations => sum of single_precision_fu_utilization, half_precision_fu_utilization, double_precision_fu_utilization
However, memory operations metric is not clear. Is that ldst_fu_utilization?
More than that, does nvprof assume only FP operations (add/mul/fma) as arithmetic operations?