Floating point operations difference between CPU and GPU

enzo30980 · November 15, 2012, 12:49pm

Hallo, I have an OpenCL kernel that implements a dot product between two float arrays.
The first is an array of size*n elements and the second is an array of n elements. This is the code

_kernel
void evaluate_product(__global const float *pFirstArray, const int n,
                      __global const float *pSecondArray, __global float *pOutput)

{
int gid = get_global_id(0); int size = get_global_size(0);
if (gid>=0 && gid<size) {
float output = 0;
for (int k=0; k<n; k++)
output += pFirstArray[gid + k*size]pSecondArray[k];
pOutput[gid] = output;
}
}
If I execute the same operations on CPU, I have different results, above all after 6 or 7 decimal digit. Why this strange behaviour? In kronos OpenCL specification (v 1.2) they say the x+y and xy are correctly rounded as well as IEEE 754 compliant. Any ideas?

cricri1 · November 15, 2012, 1:23pm

on cpu you use double precision 15 digit
on gpu you use single precision 8 digit who are convert on double when cpu print them

use double not float

Tiomat · November 15, 2012, 1:30pm

CPUs tend to do floating point calculations in 80-bit ‘extended’ mode and keep the results in this intermediate format. As such subsequent calculations are using the 80 bit value.
On the GPU the single precision is 32 bit and double is 64 bit. As such doing lots of calculations on floating points you are likely to get small differences even if you choose to use double precision.

njuffa · November 15, 2012, 1:32pm

I would suggest readin this (and any of the references cites):

[url]https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf[/url]

enzo30980 · November 16, 2012, 2:04pm

Hi,
tks to everybody for your answers. Now I have the same results using double (both on CPU and GPU). Values differ only after the 14 decimal digit.
Now I have another question. In my kernel I need to do a thresholding of the output double value computed and for that I use this line of code:

if (output <= 0) pOutput[gid] = 1 (pOutput is initialized with all 0 values)

But I obtain 1 values corresponding to double value 0.02192208167984 or 0.00040051234362.
How can it be possible?

wanderine · November 16, 2012, 3:36pm

Don’t you have to write

if (output <= 0.0f) pOutput[gid] = 1.0f

Topic		Replies	Views
Floating point operations difference between CPU and GPU Announcements	2	2576	January 22, 2013
CPU and GPU Floating point anomaly CUDA Programming and Performance	10	5601	November 10, 2013
floating point precision CUDA Programming and Performance	3	1460	April 10, 2009
Did any tried double precision computation? CUDA Programming and Performance	0	1976	September 9, 2009
Precision Problem CUDA Programming and Performance	2	1724	June 21, 2009
double precision CUDA Programming and Performance	1	253	May 29, 2019
Floating points CUDA Programming and Performance	3	2060	October 28, 2010
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5400	April 12, 2012
Why does device give wrong answer to simple math? CUDA Programming and Performance	5	2746	November 16, 2011
Floating Point Accuracy CUDA Programming and Performance	11	30414	April 6, 2013

Floating point operations difference between CPU and GPU

Related topics