Flop counting

Does anybody know how many flops ex2, rsqrt, rcp etc should be counted as?
I’m using one flop for all the instructions, but curious if it’s fare enough…

This is a common problem in performance reporting. In our n-body paper in GPU Gems 3, we chose to go with 1 flop for any of the above, 2 flops for MAD, and 1 flop for ADD and MUL. We have seen others do the same. However, in the literature there is wide variability – in other n-body papers the calculation that we count as 20 flops is reported as 38 flops quite often. So I guess the answer is that there is no standard.

Note that when we reported the peak GFLOP/s of G80 we used MUL and MAD – not special function instructions like rsqrt and ex2.

Now, slightly off topic:

In general GFLOP/s should be used as a measure of hardware efficiency for a given application – i.e. comparing an application’s achieved GFLOP/s vs. the theoretical peak GFLOP/s of the hardware it is run on – and not as a metric for comparing performance on different hardware.

To compare performance of an application on different hardware, you should use a metric that is intrinsic to the application being run, not the hardware it is run on. For example for n-body simulations a useful metric is the number of body-body interactions per second (such as gravity force computation between the bodies). For a computer graphics application a useful metric is milliseconds per frame or FPS. For FFT it might be “512x512 image FFTs per second”. Etc…

Sorry for the digression. :)

Mark