double precision emulation implementing double precision in CUDA

Ladies & Gentlemen around here,

I am currently interested in implementing double precision emulation in CUDA; not fully, but limited to subtraction only. That is, I have two double precision numbers, for example x and y, and I need to know the difference z = x-y. The number x & y must be in double precision, but the difference can be single precision. In this way, I do not loose significant digits in z.

As we all now, CUDA currently does not support double precision arithmetics, but for some scientific calculations it is crucial to perform some arithmetics, especially differences between large numbers, in double precision; otherwise, the result will have a very large error! Gladly, not all code has to be single precision, and most of the time these differences can be stored as a single precision number (though the actual difference must be in computed in double precision, and variables whose difference is required must also be in double precision). Performance impact won’t be large, as far as CUDA kernel is bandwidth bound.

I am aware of softfloat library (Berkeley SoftFloat) for CPU, and I was wondering if anybody has experience in emulating double precision in CUDA.

Not much have to add, but any comments & suggestions are welcome.

Cheers,
Evgheni

Does the Mandelbrot example included with CUDA implement some basic double precision operations?

Wow, that was fast!

I’ve checked Mandelbrot example, and it appears that here are some calculations done in double precision, including subtraction. I’ll see how can I adopt this to my application.

Thanks!

Ev.

A good place to look for pseudo-double precision algorithms is the dsfun90 library:

[url=“http://crd.lbl.gov/~dhbailey/mpdist/”]http://crd.lbl.gov/~dhbailey/mpdist/[/url]

The representation used in that library is one which represents a double as the sum of two single precision numbers with different exponents. In base 10, this would be like representing 1.00000001 as (1e0 + 1e-8). The addition algorithm is probably the easiest to implement.

Thanks all for replies! It seems I have enough information too look into. My interests, however, also lie in emulating extended (80) and quad-precision (128) bits, but I guess this will be a step after 64 bit precision emulation.

Cheers,
Evghenii