Does anybody know how many flops ex2, rsqrt, rcp etc should be counted as?
I’m using one flop for all the instructions, but curious if it’s fare enough…
This is a common problem in performance reporting. In our n-body paper in GPU Gems 3, we chose to go with 1 flop for any of the above, 2 flops for MAD, and 1 flop for ADD and MUL. We have seen others do the same. However, in the literature there is wide variability – in other n-body papers the calculation that we count as 20 flops is reported as 38 flops quite often. So I guess the answer is that there is no standard.
Note that when we reported the peak GFLOP/s of G80 we used MUL and MAD – not special function instructions like rsqrt and ex2.
Now, slightly off topic:
In general GFLOP/s should be used as a measure of hardware efficiency for a given application – i.e. comparing an application’s achieved GFLOP/s vs. the theoretical peak GFLOP/s of the hardware it is run on – and not as a metric for comparing performance on different hardware.
To compare performance of an application on different hardware, you should use a metric that is intrinsic to the application being run, not the hardware it is run on. For example for n-body simulations a useful metric is the number of body-body interactions per second (such as gravity force computation between the bodies). For a computer graphics application a useful metric is milliseconds per frame or FPS. For FFT it might be “512x512 image FFTs per second”. Etc…
Sorry for the digression. :)
Mark