Hello everyone,
I’m trying to get the performance (Gigaflop/s of my vector addition), I have already found this:
float msecPervectAdd= ms / nIter;
double gigaFlops = (numElements * 1.0e-9f) / (msecPervectAdd/ 1000.0f);
NB:
ms = whole execution time (in ms)
nIter = iterations that use to have longer runs
numElements = the data size of my vectors
But I still want to be sure about it.
Your help is appreciated.
Dorra
Yes, your methodology should give sensible results. By that I mean it should be an accurate measure of the number of floating point operations per second achieved by the code.
However, vector addition is not likely to be compute bound. So you are likely to be effectively measuring memory bandwidth rather than anything related to the actual compute performance of the GPU you are running on.
You may want to understand the analysis methodology described here:
[url]cuda - Nvidia Jetson Tx1 against jetson NANO (Benchmarking) - Stack Overflow
It’s approximately a single aspect of “roofline analysis” to determine the limiting factor in your code. The performance “roofline” for this type of code is actually determined by memory bandwidth of the GPU, not compute performance.