Hallo, I have an OpenCL kernel that implements a dot product between two float arrays.
The first is an array of size*n elements and the second is an array of n elements. This is the code
_kernel
void evaluate_product(__global const float *pFirstArray, const int n,
__global const float *pSecondArray, __global float *pOutput)
{
int gid = get_global_id(0); int size = get_global_size(0);
if (gid>=0 && gid<size) {
float output = 0;
for (int k=0; k<n; k++)
output += pFirstArray[gid + k*size]pSecondArray[k];
pOutput[gid] = output;
}
}
If I execute the same operations on CPU, I have different results, above all after 6 or 7 decimal digit. Why this strange behaviour? In kronos OpenCL specification (v 1.2) they say the x+y and xy are correctly rounded as well as IEEE 754 compliant. Any ideas?
CPUs tend to do floating point calculations in 80-bit ‘extended’ mode and keep the results in this intermediate format. As such subsequent calculations are using the 80 bit value.
On the GPU the single precision is 32 bit and double is 64 bit. As such doing lots of calculations on floating points you are likely to get small differences even if you choose to use double precision.
Hi,
tks to everybody for your answers. Now I have the same results using double (both on CPU and GPU). Values differ only after the 14 decimal digit.
Now I have another question. In my kernel I need to do a thresholding of the output double value computed and for that I use this line of code:
if (output <= 0) pOutput[gid] = 1 (pOutput is initialized with all 0 values)
But I obtain 1 values corresponding to double value 0.02192208167984 or 0.00040051234362.
How can it be possible?