There is a difficulty calculating GFLOPS on paper and what is achieved by nvprof.
According to the definition, FLOP per second is found by dividing number of FP operations by the kernel execution time.
When I run nvprof --metrics flop_count_sp ./program
I see that the kernel execution time is 6.6ms and the number of FP operations are 2,134,544,004
So, the GFLOPS is
2134544004/(6.6*0.001)/1000000000 = 323.4 GFLOPS
When I run nvprof --metrics flop_sp_efficiency ./program
I see that the kernel execution time is 5.7ms and the efficiency is 16.8
So, the GLOPFS for M2000 which has a peak of 1768 GFLOPS will be
0.168*1768/100 = 297 GFLOPS
So, which kernel duration is valid for calculating GFLOPs? 297 or 323?
I know that the difference in time is due to the metrics overhead. But I want to know which one is more reliable?