Simple division operation is different in CPU and GPU, why?

Hello All,

simple division operation is differing in CPU and GPU…

double op = 1080/(double)600;

the op value is 1.80000000 in CPU
and in GPU , it is 1.799999952.

When we use calculator for 1080/600, it gives exact 1.8
then why is the difference in GPU?
how can we set both GPU and CPU are same?

You need to read the documentation. CPUs use 80 bit temporary registers to do floating point work, where the the GPU (9000 series and lower) uses 32 bit floats.

It’s a problem of using floats vs doubles here. On my machine:

#include <cstdio>

int main() {

	float fa = 1080/(float)600;

	double da = 1080/(double)600;

	printf("%.10f \n%.10f \n",fa, da);

	return 0;





What you should also know is that such differences are expected in numerical computations. One thing is whether the numbers are stored as floats or as doubles, another is how the computations are implemented. IEEE754 doesn’t define the exact results of various functions, only the bit representation of numbers. Such small differences are exactly why floating point numbers should always be measured with a certain epsilon.

Additionally, CUDA implements division in a non-standard way using fast reciprocal.

Floating point computations carried out by programs compiled with different compilers (not to mention running on radically different hardware) will come out slightly different and one has to live with it.


I’ve also faced such a problem some time ago. Reasons of this issue are explained in previous posts, possible solution is to use doubles for division (of course you should have G200-bases hardware for it), but this approach also won’t fix the case when the result of division simply can’t be exactly represented in float due to 4 bytes limitation.

Hope this helps.

By default, the compiler generates code for 1.0 hardware which does not support double-precision floating-point. Did you put in the option -arch=sm13 ?

Not only 1.3 architectute and also 1.3 GPU Hardware.

Thanks for valuable hints. We have to set not only arch but also hardware.