Floating Point Operations per SEC calculation


As nvprof providing the metrics for the floating point operations count for kernel.
How to calculate the floating point operations Per sec for multiple kernel calls ?

Do we need to consider the Time provided by the nvprof ? Or any other methods ?

total floating point ops divided by total kernel duration

many nsight compute metrics can be optionally configured to deliver a per-second measurement.

If we have some n kernels and each kernel being called several times. How to calculate the flops ?

I can get the Flop_count_per_kernel from nvprof -metrics.
As well as I can get No_of_calls & Time from the nvprof.

Flops = (No_of_calls * Flop_count_per_kernel ) / Time.

Is that calculation is correct ?
If wrong, plz mention the correct way to do.


for a metric like flops, nvprof will display minimum, maximum and average numbers across n runs, for each kernel.

I would take the average number for a given kernel, and multiply it by the number of times that kernel is run. I would add these products for all the kernels in question, then divide that total by the total duration of all the kernels. All of this data is available from nvprof. You would have to combine the results of each separate kernel together.

That should give you a fairly defensible number that you can call the average flops per second for your device code (or for those kernels in your device code).

Okay…Thanks for your reply…