You need to read the documentation. CPUs use 80 bit temporary registers to do floating point work, where the the GPU (9000 series and lower) uses 32 bit floats.
It’s a problem of using floats vs doubles here. On my machine:
#include <cstdio>
int main() {
float fa = 1080/(float)600;
double da = 1080/(double)600;
printf("%.10f \n%.10f \n",fa, da);
return 0;
}
prints:
1.7999999523
1.8000000000
What you should also know is that such differences are expected in numerical computations. One thing is whether the numbers are stored as floats or as doubles, another is how the computations are implemented. IEEE754 doesn’t define the exact results of various functions, only the bit representation of numbers. Such small differences are exactly why floating point numbers should always be measured with a certain epsilon.
Additionally, CUDA implements division in a non-standard way using fast reciprocal.
Floating point computations carried out by programs compiled with different compilers (not to mention running on radically different hardware) will come out slightly different and one has to live with it.
I’ve also faced such a problem some time ago. Reasons of this issue are explained in previous posts, possible solution is to use doubles for division (of course you should have G200-bases hardware for it), but this approach also won’t fix the case when the result of division simply can’t be exactly represented in float due to 4 bytes limitation.
By default, the compiler generates code for 1.0 hardware which does not support double-precision floating-point. Did you put in the option -arch=sm13 ?