Differences in FLOPS calculation

mahmood.nt · November 15, 2019, 1:33pm

There is a difficulty calculating GFLOPS on paper and what is achieved by nvprof.
According to the definition, FLOP per second is found by dividing number of FP operations by the kernel execution time.

When I run nvprof --metrics flop_count_sp ./program
I see that the kernel execution time is 6.6ms and the number of FP operations are 2,134,544,004
So, the GFLOPS is
2134544004/(6.6*0.001)/1000000000 = 323.4 GFLOPS

When I run nvprof --metrics flop_sp_efficiency ./program
I see that the kernel execution time is 5.7ms and the efficiency is 16.8
So, the GLOPFS for M2000 which has a peak of 1768 GFLOPS will be
0.168*1768/100 = 297 GFLOPS

So, which kernel duration is valid for calculating GFLOPs? 297 or 323?

I know that the difference in time is due to the metrics overhead. But I want to know which one is more reliable?

sivakumaranandan · December 26, 2019, 6:00pm

Hi @mahmood.nt,

So, which kernel duration is valid for calculating GFLOPs?

Are the profile metrics consistent over N attempts ?
Have you tried adding iterations to your computation within the Kernel ? This will ensure the execution time will be much longer, thus reducing the impact of inaccuracies in time duration measured. Also, the kernel launch overheads will be amortised.

-SKA

Topic		Replies	Views
Regarding flop efficiency reported by nvprof CUDA Programming and Performance	0	609	October 25, 2019
Floating Point Operations per SEC calculation CUDA Programming and Performance	4	849	September 24, 2021
How to profile the CUDA application only by nvprof Visual Profiler and nvprof	1	2800	May 21, 2018
nvprof metric 'flop_dp_efficiency' is reported per SMX? or entire GPU? CUDA Programming and Performance	1	1304	May 30, 2015
Flop/s measurement CUDA Programming and Performance	2	5455	September 14, 2010
how to calculate #Gflop/sec? CUDA Programming and Performance	2	6904	April 29, 2009
confusion about nvprof documentation CUDA Programming and Performance	1	1135	November 18, 2013
flops calculation by profiler / of maximum CUDA Programming and Performance	6	14406	August 7, 2008
Estimating performance in FLOPS what's the correct way to do it? CUDA Programming and Performance	2	9147	February 20, 2008
Question about measuring GFLOPS CUDA Programming and Performance	1	949	February 21, 2012

Differences in FLOPS calculation

Related topics