I’d like to compute the GFLOPS of my kernel. Do I have to count all the operation (+ - * / > < …) or only (+ - * /)?

I think its only the +, -, * and /

Because FLOPS is Floating point Operations Per Seconds

And i think < and > are not realy an special operation on fp numbers.


just something additional. For a scientific kernel you usually only count the operations usefull for the task. So addressing calculations, for loop additions etc. (yes for loop is always integer, but some folks make weird things) are not counted for the performance of the real task.

To be fair, one should also be sure to include any host-GPU I/O time in the timings as well…

It’s important to be somewhat conservative with timings, as there are plenty of people out there that think GPU performance numbers are inflated or unrealistic. Better to show them that they are NOT inflated, and be conservative, than to give these sort of people any ammunition to use against GPU computing…


When calculating the flop rate to use as a measure of hardware efficiency (i.e. when comparing to 345 GFLOP/s), doesn’t it make more sense not to include the I/O time?

Right, depends on what you want. If you are doing your own benchmarking to decide whether there is any benefit to further optimization of the code, then ignoring the I/O time is appropriate. If you are quoting to others the overall performance of your code, including the I/O time makes sense to convey the total speed of the GPU algorithm.