Hallo, I have an OpenCL kernel that implements a dot product between two float arrays.
The first is an array of size*n elements and the second is an array of n elements. This is the code
_kernel
void evaluate_product(__global const float *pFirstArray, const int n,
__global const float *pSecondArray, __global float *pOutput)
{
int gid = get_global_id(0); int size = get_global_size(0);
if (gid>=0 && gid<size) {
float output = 0;
for (int k=0; k<n; k++)
output += pFirstArray[gid + k*size]pSecondArray[k];
pOutput[gid] = output;
}
}
If I execute the same operations on CPU, I have different results, above all after 6 or 7 decimal digit. Why this strange behaviour? In kronos OpenCL specification (v 1.2) they say the x+y and xy are correctly rounded as well as IEEE 754 compliant. Any ideas?