FLOP count

black_ij · December 7, 2008, 10:28pm

How can we count FLOPs in a cuda kernel? Does it have to be from the PTX code?

black_ij · December 8, 2008, 1:26am

Also another question is how to find the actual GFLOPS achieved by the code besides the upper limit?

paulius · December 8, 2008, 8:52pm

Generally, FLOPS you count to measure performance are “algorithmic” FLOPS (operations in the algorithm, not implementation itself), divided by elapsed time. Most of the time these flops do match what you’re doing during execution, as the majority of “utility” instructions (address/index computation, etc.) are integer operations.

Paulius

alex_dubinsky · December 10, 2008, 4:40am

To only the most elemental algorithms does the concept of “algorithmic FLOPS” apply. Things like multiplying matrices and solving FFTs. Most real-world algorithms don’t have some irriducible minimum of “true work” that must be carried out, they’re flexible.

black_ij, this question has been discussed to death, search around. You can’t calculate flops “magically”, you just have to understand your code and know how many operations it will execute. You can also look at the instruction counter in the visual profiler, but this lists all instructions (including integer ones). Honestly, this is actually more useful.

Finally, the concept of FLOPS is sort of irrelevant. First, because most algorithms are limited by bandwidth. Second, because what matters is how the app performs on a GPU vs a CPU. Eg, you get “10 GFLOPS” on your CUDA code. Ok, then what? Well, if a CPU can only do 0.1 GFLOPS, even after being optimized to death, then you’ve actually done really well.

Topic		Replies	Views
Estimating performance in FLOPS what's the correct way to do it? CUDA Programming and Performance	2	9058	February 20, 2008
Flops counter may be just simple script? CUDA Programming and Performance	8	5624	November 19, 2008
GFLOPS CUDA Programming and Performance	5	11344	May 12, 2008
Counting FLOPS...again how much does each operation count? CUDA Programming and Performance	5	17039	December 14, 2010
evaluate the FLOPS CUDA Programming and Performance	5	2021	November 25, 2008
Benchmarking a program What is the best option for finding the FLOP for a given thread? CUDA Programming and Performance	10	1190	August 21, 2010
How to quantify speed FLOPs integer and logic operations per second CUDA Programming and Performance	3	2009	September 14, 2011
Flop/s measurement CUDA Programming and Performance	2	5383	September 14, 2010
Finding the theoretical FLOPS of an OpenCL device Is there a way to find the theoretical maximum FLO CUDA Programming and Performance	6	2247	August 18, 2011
Calculatin FLOPS of GPU CUDA Programming and Performance	2	19093	February 10, 2017

FLOP count

Related topics