CPU and GPU Floating point anomaly

rakraul · October 18, 2013, 9:55am

Hi friends

We have a CPU C++ program giving floating point output A
A CUDA C++ port of the CPU C++ program was developed on TESLA C1060 GPU giving floating point output B .

We are getting outputs A and B matching upto only one digit after decimal point.

Please tell us what we should do to get better matching outputs of A and B for more than 6 digits after decimal point.

regards
Team

pasoleatis · October 18, 2013, 9:57am

What is the precision of the GPU code?

rakraul · October 18, 2013, 11:19am

double precision for GPU and CPU code

cbuchner1 · October 18, 2013, 1:05pm

How many significant digits are in front of the decimal point? how many of these match?

pasoleatis · October 18, 2013, 1:25pm

You can get error accumulating when you add (or some other operation) two numbers a and b and a/b is of the same order as the precision. But for the double precision you would have to have about 1 billion operations to have so much error accumulating.

njuffa · October 18, 2013, 5:01pm

I would recommend reading the following whitepaper, if you haven’t had the chance to do so:

[url]http://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf[/url]

Not knowing anything about the code other than that it is double-precision code, the most likely cause for numerical discrepancies between CPU and GPU would be the merging of double-precision multiplication and addition into double-precision FMA (fused multiply add). You can turn that off by passing -fmad=false to nvcc, but this will likely reduce the accuracy and performance of the GPU code. Generally speaking, the use of FMA typically improves accuracy by reducing rounding and providing some protection from subtractive cancellation.

I am not exactly sure what you mean by “matching to within one digit after the decimal point”. Can you show an example pair of results? How many digits are there altogether, and how many match?

Depending how big the numerical differences are, they could also be due to a bug in the code. Other than a careful review of the code, make sure that the code checks the status of all CUDA API calls and kernel launches, and run the program under cuda-memcheck. Please be aware that on an sm_13 device it will be able to provide only very limited checking due to hardware limitations.

pasoleatis · October 19, 2013, 8:20pm

Hello,

In this blog:
https://developer.nvidia.com/content/everything-you-ever-wanted-know-about-floating-point-were-afraid-ask?display[%24ne]=defaultcluster-managementtegra-hardware-sales-inquiriesrdp%2Fnsight-visual-studio-edition-registered-developer-program
there is a link to an article describing how the numbers are kept in gpu and how you can evaluate the rounding error for 1 operation, though if the rounding is the problem you would have to have many operations to get to such big difference.

In one of my codes there was a difference in the numbers from gpu compared to the cpu, after many hours spent on ‘fixing’ the gpu part it turned out that the cpu part had a mistake.

Another problem I had on the same code was the addition of 2 numbers (x position + size of system) -0.000xxx + 650.xxxx was giving on the gpu aloways 650.xxxx. This is in fact a precision problem since the small number divided by the large number is app the the same as the precision of float numbers. We fixed this by shifting the box with half of the size of the box so that we alwys add or substract number of similar size and the precision lost is minimum.

rakraul · November 9, 2013, 12:22pm

njuffa

GPU output
1.9148892009282

CPU output
1.9023344567543

The cuda program has only double precision code and CPU serial code too is only double precision data.

How am I to solve the anomaly in above outputs ?

if the code has operations like

a += b*(c+d)+r1*(h+4)*u+y(u+5)

rakraul · November 9, 2013, 12:32pm

My program in C++ as the following sample code

for(eln=0;eln<12;eln++)
{
vol[eln]delt0.5uub1[eln]b1[eln]+uvb1[eln]c1[eln]+uwb1[eln]d1[eln]+uvc1[eln]b1[eln]+vvc1[eln]c1[eln]+vwc1[eln]d1[eln]+uwd1[eln]b1[eln]+vwd1[eln]c1[eln]+wwd1[eln]*d1[eln])

rhsw[n4]=rhsw[n4]+(adv[4][1]+upw[4][1]+anuef*ak41)wvel[n1]+
(adv[4][2]+upw[4][2]+anuefak42)wvel[n2]+
(adv[4][3]+upw[4][3]+anuefak43)wvel[n3]+
(adv[4][4]+upw[4][4]+anuefak44)*wvel[n4]+sw;
dux=b1[eln]*uvel[n1]+b2[eln]*uvel[n2]+b3[eln]*uvel[n3]+b4[eln]*uvel[n4];
}

for(int i=1;i<=nodes;i++)
{

			uvel[i]=uvel[i]-delt*rhsu[i]/eml[i];

}

after porting to cuda

i am getting

GPU output
1.9148892009282

CPU output
1.9023344567543

difference

how to get matching results ?

seibert · November 10, 2013, 3:58pm

Differences at this level suggest you have a numerical stability problem. I would first compare to a quad-precision CPU implementation to estimate the error of both of the double precision calculations. You may find the GPU calculation is more precision simply due to changes in the order of operations.

Unfortunately, then you need to think carefully about how numerical error accumulates in your equations. Summations and differences with many terms can rapidly reduce the precision of an answer.

pasoleatis · November 10, 2013, 4:50pm

Some of my colleagues used software as Mathematica where the precision could be set arbitrary high. This way they could check different to do the operations in order to minimize the rounding error.

Topic		Replies	Views
Precision Problem CUDA Programming and Performance	2	1725	June 21, 2009
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10626	November 26, 2009
Precision of floats does CUDA use half precision instead of single precision for floats? CUDA Programming and Performance	5	2274	March 15, 2010
floating point precision CUDA Programming and Performance	3	1462	April 10, 2009
Floating Point Accuracy CUDA Programming and Performance	11	30418	April 6, 2013
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1494	June 10, 2015
precision CUDA Programming and Performance	3	2614	December 16, 2008
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5410	April 12, 2012
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	761	August 9, 2023
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	21925	August 7, 2010

CPU and GPU Floating point anomaly

Related topics