Differences in FLOPS calculation

There is a difficulty calculating GFLOPS on paper and what is achieved by nvprof.
According to the definition, FLOP per second is found by dividing number of FP operations by the kernel execution time.

When I run nvprof --metrics flop_count_sp ./program
I see that the kernel execution time is 6.6ms and the number of FP operations are 2,134,544,004
So, the GFLOPS is
2134544004/(6.6*0.001)/1000000000 = 323.4 GFLOPS

When I run nvprof --metrics flop_sp_efficiency ./program
I see that the kernel execution time is 5.7ms and the efficiency is 16.8
So, the GLOPFS for M2000 which has a peak of 1768 GFLOPS will be
0.168*1768/100 = 297 GFLOPS

So, which kernel duration is valid for calculating GFLOPs? 297 or 323?

I know that the difference in time is due to the metrics overhead. But I want to know which one is more reliable?

Hi @mahmood.nt,

So, which kernel duration is valid for calculating GFLOPs?

  1. Are the profile metrics consistent over N attempts ?
  2. Have you tried adding iterations to your computation within the Kernel ? This will ensure the execution time will be much longer, thus reducing the impact of inaccuracies in time duration measured. Also, the kernel launch overheads will be amortised.

-SKA