There is a difficulty calculating GFLOPS on paper and what is achieved by nvprof.

According to the definition, FLOP per second is found by dividing number of FP operations by the kernel execution time.

When I run **nvprof --metrics flop_count_sp ./program**

I see that the kernel execution time is 6.6ms and the number of FP operations are 2,134,544,004

So, the GFLOPS is

2134544004/(6.6*0.001)/1000000000 = 323.4 GFLOPS

When I run **nvprof --metrics flop_sp_efficiency ./program**

I see that the kernel execution time is 5.7ms and the efficiency is 16.8

So, the GLOPFS for M2000 which has a peak of 1768 GFLOPS will be

0.168*1768/100 = 297 GFLOPS

So, which kernel duration is valid for calculating GFLOPs? 297 or 323?

I know that the difference in time is due to the metrics overhead. But I want to know which one is more reliable?