Hallo, I have an OpenCL kernel that implements a dot product between two float arrays.

The first is an array of size*n elements and the second is an array of n elements. This is the code

```
_kernel
void evaluate_product(__global const float *pFirstArray, const int n,
__global const float *pSecondArray, __global float *pOutput)
```

{

int gid = get_global_id(0); int size = get_global_size(0);

if (gid>=0 && gid<size) {

float output = 0;

for (int k=0; k<n; k++)

output += pFirstArray[gid + k*size]*pSecondArray[k];
pOutput[gid] = output;
}
}
If I execute the same operations on CPU, I have different results, above all after 6 or 7 decimal digit. Why this strange behaviour? In kronos OpenCL specification (v 1.2) they say the x+y and x*y are correctly rounded as well as IEEE 754 compliant. Any ideas?