Hello,
In my code, I was surprised to find that the theoretical number of FLOPs of my algorithm did not coincide with that given by nvvp. After a long time sleuthing through the code, I pinpointed one of the lines that is being miscounted by nvvp (flops_dp).
I am curious to know what is going on behind the scenes for this to happen…
These are the relevant lines of code:
#define CON 3.1415967
__device__ double get_const() {
return CON;
}
__device__ double func(double *Z) {
double r1, r2, r3, r4;
r1 = Z[0];
r2 = Z[1];
r3 = Z[2];
r4 = Z[3];
return (get_const() - 1.) * (r4 - (r2*r2 + r3*r3) / 2. / r1);
}
The return statement counts for flops_dp = 21 double precision floating point operations! :-O
I would only expect either 8 or 7 operations. The former if we count (-, *, -, *, +, *, /, /) = 8 operations. The latter if the first subtraction (get_const() - 1.) is being optimized out by the compiler.
If I remove the final division, making the return statement
return (get_const() - 1.) * (r4 - (r2*r2 + r3*r3) / 2.)
I only get 6 operations which is reasonable, and what I’d expect if the first subtraction is optimized out! Does anyone have an idea what is going on?