Estimating performance in FLOPS what's the correct way to do it?

RoofTopG · February 20, 2008, 11:36am

What is the correct method to analytically estimate the kernel performance in FLOPS on the CUDA GPU? My goal is not to measure max. performance of the GPU, but to get the correct estimate of FLOPS for this algorithm and compare it with the CPU implementation.

Is there some reference doc on this topic maybe?

Thanks in advance!

serge · February 20, 2008, 11:51am

IMO, term “FLOPS” by itself is not correct way to measure algorithm performance. Since there are some problems:

What number of FLOPS operations like sin, cos, exp etc. should be converted into?
What to do with the fact that some operations on CPU can take different clock cycles depending on arguments.
What ro do with global memory latency and read-after-write dependencies that seriously affect performance.

Thus, IMO the best way of comparsion is time measure of both CPU and GPU implementations.

BTW, i’ve done some experiments with my kernel timimgs
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA
may be they give you some ideas for your investigations.

DenisR · February 20, 2008, 2:11pm

analyzing FLOPS (counting each operation as 1 FLOP) is useful for determining if you are utilizing the full device capabilities. But one should then also look at how many GB/s you are doing to see if you have reached either of the limits.

Topic		Replies	Views
FLOP count CUDA Programming and Performance	3	6676	December 10, 2008
GFLOPS CUDA Programming and Performance	5	11348	May 12, 2008
evaluate the FLOPS CUDA Programming and Performance	5	2031	November 25, 2008
Differences in FLOPS calculation CUDA Programming and Performance	1	785	December 26, 2019
Benchmarking my 8800gt Just a quick, simple question (I hope) CUDA Programming and Performance	0	1049	July 28, 2008
Flop/s measurement CUDA Programming and Performance	2	5385	September 14, 2010
How to quantify speed FLOPs integer and logic operations per second CUDA Programming and Performance	3	2023	September 14, 2011
Finding the theoretical FLOPS of an OpenCL device Is there a way to find the theoretical maximum FLO CUDA Programming and Performance	6	2257	August 18, 2011
Benchmarking a program What is the best option for finding the FLOP for a given thread? CUDA Programming and Performance	10	1215	August 21, 2010
Where do all the little FLOPS come from? still dont understand the spec CUDA Programming and Performance	8	18596	February 23, 2007

Estimating performance in FLOPS what's the correct way to do it?

Related topics