Precision issue! Wrong result for a multiplication


I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.

Here is my test:

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#define XBLOCK 256

#define YBLOCK 1

#define XGRID 84

#define YGRID 1

/** Kernel function **/

__global__ void KernelTest()


	// Thread index

	int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);


	float A, result;


        A = 11877.303711f;	

        result = A*A;



		printf("A=%f - result=%f\n",A, result );

/* The displayed result is 141070336.0 instead of 141070343.443334 in double */



int main(){




	/** Launch of the kernel **/

	dim3 blockSize(XBLOCK,1);

	dim3 gridSize(XGRID,1);

	KernelTest<<<gridSize, blockSize>>>();


I would like to know if I can have the good result with float. I tried using __fmul_ru,d,n but still wrong.

I am afraid there is not solution but I still hoping there is one.

EDIT: One idea I had if to compute the result like this

float product1 = __fmul_rd( A,A );

	float product2 = __fmul_ru( A,A );

	float result = 0.5f*( produit1 + produit2 );


		printf("A=%f - product1=%f - product2=%f - result=%f\n",A,product1,product2, result );


But the result is false. I have this:

A=11877.303711 - product1=141070336.000000 - product2=141070352.000000 - result=141070336.000000

Last thing, what do you think to use double instead of float?

Could you give me some advice?



Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.

Thanks you very much for trying to help me but it change nothing!

It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.

Any more digits that you display with printf have no meaning, they’re essentially random numbers ;)

People really need to understand these limitations.



the problem with your first source code that there is actually no computation happening inside your kernel… what you see in the output is a result that was computed by the compiler, not by the GPU.

even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.
next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.
So for this concrete number and computation the result is perfectly right and the best you can get with float.
For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.
The only way to higher precision without actually using double is simulation of double precision using single precision… google is your friend in that case :)


@ cbuchner1 : Thanks for the explanation. It seems logical now.

@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about “simulation of double precision using single precision” but I found nothing. Do you have some examples, or references please?

Thanks again,

Google for “double single” computation or a cuda port of the dsfun library (from Fortran)

a search on these forums for dsfun might also turn out some results


Thanks for the post. I started to program CUDA with almost 0 knowledge of C. I practically learned C in the same time with CUDA.