I have seen the following results for a kernel
Single precision ADD = 0
Single precision FMA = 1,056,964,608
Single precision MUL = 2,642,411,520
Single precision SPECIAL = 1,585,446,912
However, the FLOP efficiency (Peak Single) is reported as n/a.
How that can be justified?
Moreover, I see
Floating point operations (Single precision) known as flop_count_sp is 4,756,340,736 while FP instructions (Single) known as inst_fp_32 is 6,870,269,952
I expected that one FP instruction may be composed of more that one floating point operations. However, here the total number of instructions is more than the counts.
Are these results consistent? Or I have to rerun the profiler? Any thought?