Regarding the following metrics
inst_fp_32 : Number of single-precision floating-point instructions executed by non-predicated threads (arithmetic, compare, etc.)
flop_count_sp : Number of single-precision floating-point operations executed by non-predicated threads (add, multiply, and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The count does not include special operations.
I want to know what is the exact different between FP instruction and FP operation? Such separation sounds like you can do a FP addition with non FP instruction. Is that right?!
For example, in my analysis, I see the following values roughly:
inst_fp_32 = 400M
flop_count_sp = 800M