Regarding the following metrics

inst_fp_32 : Number of single-precision floating-point instructions executed by non-predicated threads (arithmetic, compare, etc.)

flop_count_sp : Number of single-precision floating-point operations executed by non-predicated threads (add, multiply, and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The count does not include special operations.

I want to know what is the exact different between FP instruction and FP operation? Such separation sounds like you can do a FP addition with non FP instruction. Is that right?!

For example, in my analysis, I see the following values roughly:

inst_fp_32 = 400M

flop_count_sp = 800M