Realistic FLOPS Estimates

john.mcinnis · July 24, 2020, 8:10pm

Hi everybody,

I understand that processor manufactures, including Nvidia, like to show theoretical FLOPS values for products which are derived from the specifications of the GPU. What I am wondering is how close to the theoretical FLOPS value is actually achievable in practice. For example using a program which would preform very repetitive FMA operations, using all the cores available, how close to the maximum FLOPS count would be realistic? If the generated assembly code ended up being just a load, fma, and store instruction for each thread, would it be around 1/3 of the listed FLOPS?

Thanks,
John

njuffa · July 24, 2020, 8:22pm

If the floating-point units are the bottleneck (i.e., high computational intensity), a reasonable first order estimate for well-optimized compiled code would be 75% of theoretical peak. An example would be BLAS3 GEMM-style matrix multiply.

However, in your chosen example memory throughput is the bottleneck (i.e. very low computational intensity). An example would be BLAS1 AXPY-style vector processing. In those circumstances the achieved FLOPS may just be 1/20 or even less of the theoretical peak FLOPS. It will depend on peak memory bandwidth, memory access patterns, data type.

These general “efficiency” effects are quite similar between modern CPUs and GPUs. It might be instructive to look at published data for HPL (High Performance Linpack) for the first scenario and HPCG (High Performance Conjugate Gradients) for the second scenario.

Topic		Replies	Views
Finding the theoretical FLOPS of an OpenCL device Is there a way to find the theoretical maximum FLO CUDA Programming and Performance	6	2243	August 18, 2011
Estimating performance in FLOPS what's the correct way to do it? CUDA Programming and Performance	2	9049	February 20, 2008
Maximizing FLOPS CUDA Programming and Performance	4	1370	October 12, 2021
evaluate the FLOPS CUDA Programming and Performance	5	2016	November 25, 2008
Some confuse about TX1 and TX2 FLOPS calculation CUDA Programming and Performance	4	5257	May 31, 2019
Calculatin FLOPS of GPU CUDA Programming and Performance	2	19078	February 10, 2017
FFT Performance Discussion about CUDA FFT performance CUDA Programming and Performance	0	8989	March 28, 2007
FLOP count CUDA Programming and Performance	3	6639	December 10, 2008
peak computational throughput CUDA Programming and Performance	3	902	December 24, 2015
NSight : How to calculate FLOP/s that's close to achieved FLOP/s CUDA Programming and Performance	3	3092	October 4, 2017

Realistic FLOPS Estimates

Related topics