Double double precision arithmetic library now available

We have released a library that contains code for negation, addition, subtraction, multiplication, division, and square root of double-double operands using a simple C-style interface.

Developers whose applications require precision beyond double precision will likely find this helpful, as double-double offers almost twice the precision of double precision.

It is available in the CUDA Registered Developer Page. The tar file also contains a simple example ( solution of a quadratic equation with different precisions)

$ ./example_dd

Solving quadratic equation with a = 1 b = -100000 c = 1

Using double precision (std. quadratic formula):
x1 = 9.99999999900e+04 ax1**2+bx1+c = 0.00000000000e+00
x2 = 1.00000033854e-05 ax2**2+bx2+c =-3.38435755864e-07

Using double-double (std. quadratic formula):
x1 = 9.99999999900e+04 ax1**2+bx1+c = 0.00000000000e+00
x2 = 1.00000000010e-05 ax2**2+bx2+c = 0.00000000000e+00

Using double precision (more robust formula):
x1 = 9.99999999900e+04 ax1**2+bx1+c = 0.00000000000e+00
x2 = 1.00000000010e-05 ax2**2+bx2+c = 0.00000000000e+00


could you indicate where to find the tar ball library more precisely ?

I couldn’t find it in the CUDA Registered Developer Page where I am registered now…

thanks !

It is available on the web site.
Let me ask around to find out why you are not able to access it.

Well, it appears I don’t have access to this page. I should register to nvonline maybe…

Ok I got it !
Thanks !

People who own consumer grade GPUs would probably be more interested in a quad float library ;)

Certainly in compute capability 1.3 and 2.0, the hardware crippled double precision is still faster than trying to do everything with floats. :)

Double precision might be bad enough in GK104 that using a software double precision implementation is faster for some things (probably only addition and subtraction).

A fully-accurate double-anything addition or subtraction requires 20 basic operations, this is what I implemented for the double-double code we posted. Even then the accuracy of double-float is lower than double precision, about 45 bits effectively, and the range is much more restricted. I therefore find it unlikely that use of double-float in place of native double precision will be of much interest on GK104.

I would recommend sticking with the native double-precision support on GK104. I actually tried some real-life double-precision HPC apps on GK104 and the performance was higher than I expected based on the throughput of the DP operations. The reason appears to be that quite a few “double precision intensive” applications actually have a surprisingly low percentage of DP instructions (only 10%-20%), and a fair number are at least partially limited by memory bandwidth. Obviously this does not extend to truly DP-intensive primitives with modest bandwidth requirements such as DGEMM and ZGEMM.

Good to know. My vague memory was 17 operations from back when I used this in the G80 days. Once the program needs to do a double precision FMA, I would imagine that GK104 hardware wins again by a comfortable factor over a double-float implementation.

I based the addition/subtraction code in the posted double-double “library” on a recent and readily accessible publication by A. Thall. As far as I can tell this approach goes back to work by Douglas Priest from around 1990 which in turn drew on work by William Kahan and T. J. Dekker. The addition code in Dekker’s original paper used fewer basic operations but did not retain full accuracy for all operand combinations.

Because hardware support for FMA allows for a very efficient double-anything multiplication, double-double addition and subtraction are much more expensive than multiplication on the GPU. I did not look into a double-double “FMA” emulation, but from writing emulation code for FMA (to support fmaf() on platforms < sm_20) I know that the cost is much larger than the cost of multiplication and addition combined, due to the need to operate on the double-width product.

This is awesome! Thanks from all of us!

I’ve been planning to implement my own 128 bit integer class with the purpose of using it for high precision fixed point computations. Maybe that’s moot now. :-)

I can’t seem to access it or find it using my forum account or my registered CUDA developer account. Could someone point me in the right way to find this library, I need to use double-double precision to compare accuracy of my CUDA kernels.


To retrieve NVIDIA’s double-double code from the registered developer website:

(1) Go to
(2) Click on green link “Registered Developer Website”
(3) Log in with username and password
(4) Click on green link CUDA/GPU Computing Registered Developer Program
(5) Scroll down to section “CUDA Double-double Precision Arithmetic”
(6) Click on green link “Download” after the section title
(7) Click green “Agree” to agree to usage conditions
(8) Download should start

The code posted is governed by the BSD license which should make for easy integration into any code base.

So we need to have a ‘CUDA/GPU Computing Registered Developer Program’ account to access the library? I applied for the account but haven’t got a reply yet. I will be grateful if someone can help me access the library sooner.

I really don’t understand why this header is stuck behind a registered developer login when it is BSD licensed.

As permitted by the BSD license, I’ve posted version 1.2 of the header publicly here: