accuracy of fp division

asm · February 12, 2009, 4:48pm

Hi,

I’m running into problem that seemingly shows that the accuracy of single precision floating-point
division on CUDA is lower than that of CPU…

Is this really true ? and if yes are there any means to deal with that ?

thanks

Simon_Green · February 12, 2009, 5:40pm

Yes, this is documented in the programming guide (P86) - single precision division is only accurate to 2 ulp (this is because the GPU implements division using reciprocal).

One work-around is to use double precision (if you GPU supports it), which is fully compliant.

dneckels · February 12, 2009, 6:55pm

Hmmmm…
This could explain why my gpu gmres solver converges in more iterations than the cpu version. Loss of orthogonality due to lower precision…

I have no idea how to test the hypothesis, though…

Maybe compare to emu mode?

asm · February 13, 2009, 3:49pm

ok thanks for reply. I have G200 but double precision div should be real slow since it does not have hardware support,

so i have to look for other options…

eyalhir74 · February 13, 2009, 6:00pm

Hi,

What would be the other work-arounds beside doubles?

thanks

eyal

SPWorley · February 13, 2009, 6:45pm

One step of Newton iteration might polish the final bits. This is certainly possible for computing 1/Z , but I imagine it could be extended for Y/Z.

The iteration

x2= x1*(2.0-Z*x1)

polishes the computation of X, converging to 1.0/Z.

Simon_Green · February 16, 2009, 10:53am

FYI, in CUDA 2.2 we are planning on adding some new device functions that will provide IEEE-compliant single precision reciprocal, square-root, and division.

Note that these will be much slower than the built-in operations, but may be useful for developers who need to match CPU results exactly.

eyalhir74 · February 16, 2009, 1:27pm

Hi Simon,

Thanks for the information, can you please suggest a workaround for now for sqrt accuracy?

Simon_Green · February 17, 2009, 11:18am

I would try using some kind of iterative method to improve the accuracy of sqrt, as SPWorley suggested:
[url=“Methods of computing square roots - Wikipedia”]http://en.wikipedia.org/wiki/Methods_of_co...ng_square_roots[/url]

Topic		Replies	Views
Double-single division CUDA Programming and Performance	2	2102	May 15, 2008
Accuracy problem I'd even say inaccuracy ... CUDA Programming and Performance	6	2871	June 28, 2008
Simple division operation is different in CPU and GPU, why? CUDA Programming and Performance	6	6615	June 9, 2009
floating point error Error with floating point division CUDA Programming and Performance	9	8606	November 30, 2007
IEEE-754 compliant division CUDA Programming and Performance	5	10277	November 26, 2008
Speed comparison of division compared to other arithmetic operations, perhaps something like clock cycles CUDA Programming and Performance	9	6872	November 19, 2024
How is FP64 division implemented CUDA Programming and Performance	13	1759	January 15, 2020
Single Precision Accuracy CUDA Programming and Performance	9	9332	October 6, 2010
Double precision Accuracy with sqrt, log math functions Results on CPU & GPU are not exactly sam CUDA Programming and Performance	9	5638	April 12, 2012
Loss of accuracy with cuCdiv? CUDA Programming and Performance	1	1075	February 23, 2012

accuracy of fp division

Related topics