Flop/s model for vector addition ?

Hello everyone,

I’m trying to get the performance (Gigaflop/s of my vector addition), I have already found this:

float  msecPervectAdd= ms / nIter;                                                               
double gigaFlops = (numElements * 1.0e-9f) / (msecPervectAdd/ 1000.0f);

ms = whole execution time (in ms)
nIter = iterations that use to have longer runs
numElements = the data size of my vectors

But I still want to be sure about it.

Your help is appreciated.

Yes, your methodology should give sensible results. By that I mean it should be an accurate measure of the number of floating point operations per second achieved by the code.

However, vector addition is not likely to be compute bound. So you are likely to be effectively measuring memory bandwidth rather than anything related to the actual compute performance of the GPU you are running on.

You may want to understand the analysis methodology described here:


It’s approximately a single aspect of “roofline analysis” to determine the limiting factor in your code. The performance “roofline” for this type of code is actually determined by memory bandwidth of the GPU, not compute performance.