I want to measure the number of Gflops a kernel achieves. If I do one multiplication and one addition in one line of code (i.e. a += b * c), does that count as 2 flops or 1 flop? I’ve read somewhere that multiply-adds can be fused into one operation?
Commonly a multiply-add is counted as two operations, even if fused into the same machine instruction. The peak GFLOP/s values in the Nvidia specs are also calculated that way.