I was doing some test and I had a problem. I wrote a test kernel for doing multiplications but the result is false. I know that it comes from the use of float because if I use the double precision, the result is correct.
Here is my test:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define XBLOCK 256
#define YBLOCK 1
#define XGRID 84
#define YGRID 1
/** Kernel function **/
__global__ void KernelTest()
{
// Thread index
int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);
float A, result;
A = 11877.303711f;
result = A*A;
if(idx==0){
printf("A=%f - result=%f\n",A, result );
/* The displayed result is 141070336.0 instead of 141070343.443334 in double */
}
}
int main(){
cudaDeviceReset();
/** Launch of the kernel **/
dim3 blockSize(XBLOCK,1);
dim3 gridSize(XGRID,1);
KernelTest<<<gridSize, blockSize>>>();
}
I would like to know if I can have the good result with float. I tried using __fmul_ru,d,n but still wrong.
I am afraid there is not solution but I still hoping there is one.
EDIT: One idea I had if to compute the result like this
Maybe this sounds stupid, but cold you try to use somthing like this %26.20f in the printing command. The rounding errors appear only when you have a large number to which is added a small numer over and over. Add or multiplying 2 numebrs once should not give difference.
It is not false. It is correct within the precision limitations of 32 bit floats. These float numbers are precise only to about 7 decimal digits. The mantissa of a 32bit float has 23 bits which translates to log10(2^^23) = 6.92 decimal digits that are significant.
Any more digits that you display with printf have no meaning, they’re essentially random numbers External Image
People really need to understand these limitations.
the problem with your first source code that there is actually no computation happening inside your kernel… what you see in the output is a result that was computed by the compiler, not by the GPU.
even if you would not use a constant inside a kernel you still have an another problem: 11877.303711 can not be represented as float, if you try it out you will see that the value stored in the float variable is 11877.3037.
next problem is that even if you somehow computed the result with higher precision ultimately you are assigning it to a float, but 141070343.443334 can not be represented as float and gets truncated to 141070336.0.
So for this concrete number and computation the result is perfectly right and the best you can get with float.
For other numbers and/or computations the result might differ between computing in float and computing in double and assigning result to float.
The only way to higher precision without actually using double is simulation of double precision using single precision… google is your friend in that case :)
@ cbuchner1 : Thanks for the explanation. It seems logical now.
@ RoBiK : Now, I do the compute in double. I know that is not efficient (according to computation time) but I have other problems I would like to correct before improving computation time. I also had a look at google about “simulation of double precision using single precision” but I found nothing. Do you have some examples, or references please?