Reading the nvprof, I see the following metrics
flop_count_sp Number of single-precision floating-point operations executed by non-predicated threads (add, multiply, and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The count does not include special operations. Multi-context
flop_count_sp_add Number of single-precision floating-point add operations executed by non-predicated threads. Multi-context
flop_count_sp_fma Number of single-precision floating-point multiply-accumulate operations executed by non-predicated threads. Each multiply-accumulate operation contributes 1 to the count. Multi-context
flop_count_sp_mul Number of single-precision floating-point multiply operations executed by non-predicated threads. Multi-context
flop_count_sp_special Number of single-precision floating-point special operations executed by non-predicated threads.
For a kernel, I get the following numbers
Floating Point Operations(Single Precision) 1150884804
Floating Point Operations(Single Precision Add) 150665905
Floating Point Operations(Single Precision FMA) 428604161
Floating Point Operation(Single Precision Mul) 143010575
Floating Point Operations(Single Precision Special) 23763835
I don’t understand the first metric, Single Precision. What is that exactly? What is accounted there that is not accounted in the rest?
Any thought?