Floating Point Accuracy

cmorrison · August 8, 2007, 11:41am

I don’t suppose any one has looked at the accuracy of the GPU floating point unit.

I’m trying to do some benchmarking of “Novel Processor Architectures for HPC” and one of the area I’m keen to explore is how the deviation from IEEE-754 thats inherent with CUDA can affect the results of a program.

If anyone has done, or sees a large deviation against a golden reference can they let me know please.

Cheers,

Chris

MisterAnderson42 · August 8, 2007, 1:05pm

I’ll be performance such tests on my system later this week. I’ll let you know what happens.

tachyon_john · August 9, 2007, 3:32pm

For our codes the deviations have been largely unnoticable. A lot of codes out there do “bad things” with their floating point already, and so there are greater sins being committed by the algorithm than by the floating point hardware on the GPU or CPU… Unless you’ve taken some care to avoid things like summing large numbers with small ones, and various other floating point pitfalls, the GPU may be the least of your problems…

John Stone

cmorrison · August 9, 2007, 4:03pm

Thats good to know.

I’m benchmarking CUDA against FPGAs and a regular CPU at the moment and if people came back to me saying there were big problems I was going to have to write a test case to see how serious the problems were. …now i can just gloss over it and say that the lack of full ieee compliance is almost insignficant.

Cheers,

Chris

tachyon_john · August 9, 2007, 4:15pm

How significant it is depends on the exact code sequence and input data. In practice, it’s wise to design your algorithm to explicitly avoid doing things that put the floating point hardware over the ropes. I don’t know what your algorithm is, I can only comment on the ones we’ve been working on.

If you do observe non-trivial differences between CPU, GPU, and FPGA, then you should go back and look at whether your algorithm is doing things that are ill-advised in terms of its use of floating point. Only after you’re sure that you’ve done everything you can to preserve numerical precision in your algorithm would it be fair to point your finger at the hardware. :-)

John Stone

potto216 · August 9, 2007, 7:12pm

Just an FYI. We compared the error of x-ifft2(fft2(x)) for cuda(single) and matlab (double). The error did not seem to grow much with matrix size.

IFFT2(FFT2(X[256 by 256])) with a cuda error of 2.669e-007 and a matlab error of 2.3853e-016
IFFT2(FFT2(X[512 by 512])) with a cuda error of 3.9338e-007 and a matlab error of 2.5231e-016
IFFT2(FFT2(X[1024 by 1024])) with a cuda error of 3.5079e-007 and a matlab error of 2.6822e-016
IFFT2(FFT2(X[2048 by 2048])) with a cuda error of 5.149e-007 and a matlab error of 2.8168e-016
IFFT2(FFT2(X[3072 by 3072])) with a cuda error of 7.3526e-007 and a matlab error of 2.9145e-016

our error metric was the mean of the absolute value between x, and ifft2(fft2(x)) (not squared). The code for the calcs is attached along with our computed histogram of error values for the cuda and matlab. The Cuda looked fairly Gaussian (probably more so if we used mse). Anyway, just an FYI as to our results. We have noticed some unusual error growth with some of our conjugant gradient algorithms, but haven’t concluded if this is CUDA or ill-conditioned data.

Cheers,
Paul
speed_fft_v2.txt (1.51 KB)

MisterAnderson42 · August 9, 2007, 9:38pm

As promised: I check the numerical accuracy of my application. A note first: my application is chaotic. A single value difference in the 18th decimal place in part of the calculation can send the simulation down a completely different path. I don’t care because there are billions of billions of statistically equivalent paths my simulation can take. The downside to this is that it makes quantitative accuracy comparisons difficult. Using a double precision CPU calculation on a single processor as a baseline, I found the number of iterations it takes for a different simulation with the same starting point to deviate significantly.

Double precision CPU calculation on 8 processors
5800 steps to deviate
Single precision CPU calculation on 1 processor
3800 steps to deviate
Single precision calculation on 1 GPU
3800 steps to deviate

So, the “inferior” floating point on the GPU is just as good as a CPU single precision calculation for me. I imagine you’d have to come up with a pretty contrived example to show that the GPU’s FP is significantly worse.

cmorrison · August 10, 2007, 10:12am

Thanks guys.

Thats the kind of information I was looking for. I did an example myself preforming some simple calculations (log, sin, cos etc) and there didn’t seem to be any difference between the single precision GPU and single precision CPU. Your more in depth examples further encourage me that there is little problem with the device not being fully IEEE compliant. I cant really think of a better test for floating point deviation than what your code would do Mr Anderson and I think I might use mathlab and my sample code to create a graph very similar to yours Paul.

Cheers,

Chris

mfatica · August 10, 2007, 2:42pm

This is a comparison of the G80 and other architectures on IEEE 754.

Massimiliano
G80_IEEE.pdf (200 KB)

cadourian · December 30, 2007, 9:10pm

Hi there,

this is a bit late, but it may be useful to whoever reads this thread in the future.

Here are two papers I had found a while back that was trying to understand the limitations in IEEE 754 compliance of GPUs specifically for scientific applications.

In this case, the author figured out how to achieve double precision with some extra code, and still achieve 4-5 x speed increase relative to CPUs.

Cheers

Chahé
ijpeds06.pdf (1.02 MB)
SC06.pdf (594 KB)

thanasio · April 6, 2013, 9:57am

Can somebody see why this simple operation can produce a 10^-3 error compared to CPU?

float deltaE = mbfsIn.kb[tid] * delta_r01
* (2.0f - delta_r01 * (6.0f - 9.333334f * delta_r01));

cheers,
Thanasio

njuffa · April 6, 2013, 11:22am

What GPU? What’s the value of delta_r01? I will assume you are running on a Fermi or Kepler class GPU. The compiler very likely generates code using two FFMAs (single-precision fused multiply-adds) for the latter part of the computation, that is,

[expr] * fmaf (fmaf (9.333334f, -delta_r01, 6.0f), -delta_r01 , 2.0f)

If either of the two products is close to the corresponding constant, but of opposite sign (meaning delta_r01 is positive) there will be subtractive cancellation, followed by renormalization. On the CPU, where the product is computed to single precision, the bits shifted in on the right will be zero, but on the GPU where all product bits are retained inside the FMA, lower order bits of the product will be shifted in on the right. The closer the product to the constant, the bigger the difference will be.

If you look at the bit pattern of the intermediate result, and see trailing zeros in the CPU result, but non-zero trailing bits in the GPU result, that would be a good indication that my working hypothesis is correct, and in that case the GPU delivers the more accurate result thanks to FMA.

If your turn off FMA generation with -fmad=false, do the results match between CPU and GPU?

I would suggest reading the following whitepaper and also the references it cites:

[url]https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIA-CUDA-Floating-Point.pdf[/url]

Topic		Replies	Views
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10750	November 26, 2009
Floating-point precision problems CUDA Programming and Performance	14	4418	January 7, 2011
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	828	August 9, 2023
precision CUDA Programming and Performance	3	2614	December 16, 2008
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	21973	August 7, 2010
floating point operations CUDA Programming and Performance	13	6784	May 16, 2010
Why accuracy CPU and GPU not equal? CUDA Programming and Performance	6	10964	October 28, 2014
CPU and GPU Floating point anomaly CUDA Programming and Performance	10	5666	November 10, 2013
Precision in Tesla Suitability of GPUs for some applications CUDA Programming and Performance	17	5619	January 12, 2009
floating point processor of GPUs CUDA Programming and Performance	7	4412	August 28, 2015

Floating Point Accuracy

Related topics