double emulation any ideas on double precision emulation?

Skribtsov · March 26, 2008, 8:35am

we need huge sparse matrices -based Linear equations solver with double precision. Anybody know perhaps somebody did that already? didn’t want to re-invent the wheel.

jordyvaneijk · March 26, 2008, 9:23am

The GPU only provides single precision float right? So I don’t see the point of emulating (via CUDA) double precision because you cannot use it on the GPU

Simon_Green · March 26, 2008, 9:38am

No current GPU hardware supports double-precision, and it’s difficult to emulate it in software efficiently. However, you can use tricks like so-called “double single” to improve precision by combining two singles:

http://crd.lbl.gov/~dhbailey/mpdist/index.html

The Mandelbrot sample in the SDK uses this.

Otherwise, you just need to wait a while for our DP hardware to be released.

This paper implements sparse matrices using segmented scan:

http://graphics.idav.ucdavis.edu/publicati…_pub?pub_id=915

BonsaiScott · March 26, 2008, 9:18pm

That is not correct. No current hardware supports double-precision.

paulius · March 27, 2008, 1:44am

Well, to be precise, one currently cannot do double-precision on any GPU available for purchase.

Paulius

BonsaiScott · March 27, 2008, 7:52am

The RV670, which has been available for purchase for ~4 months, supports double precision in hardware.

DenisR · March 27, 2008, 8:10am

You mean the hardware that would be released late last year? External Media (Sorry I couldn’t help myself because I am having deadlines in the near future ;))

Is there any new info on when that would be available that you can tell in public?

jordyvaneijk · March 27, 2008, 8:39am

It does not support native double precision fp64 but does 2 fp32 this is very slow

Check this out

Understand me good. I don’t hate ATI (I have ATI’s at home myself 2) but this is something to be ashamed of… Telling you support DP FP64 which isn’t DP FP64…

So please don’t tell ATI supports DP FP because they don’t.

grangerfx · March 27, 2008, 8:07pm

Interesting article. Thanks for posting it.

That does bring up an interesting point: When NVidia hardware does support double precision, will they support double precision math using single precision emulation like ATI is currently doing? That would be useful for code compatibility. Also we have no idea how fast NVidia’s double precision math will be compared to single precision but I would guess it will be a better ratio than ATI is able to achieve with emulation.

BTW NVidia never did announce when their double precision hardware would become available. I think a lot of us assumed it would be released around October or November last year because that’s when NVidia typically releases a hardware upgrade. In retrospect I realize that this was just wishfull thinking on our part. I think that NVidia is trying to do something similar to Intel with a two stage hardware upgrade cycle. The first stage is a new architecture (the 8800) and the second stage is a die shrink with minimal new features (the 9800). I am already crossing my fingers that this fall will bring the next generation of NVidia hardware with double precision and a lot of other goodies.

-Mark Granger

eelsen · March 27, 2008, 9:16pm

Does it really matter how it is implemented in the silicon? If you want a 2m bit multiplier

and the hardware designer does it by chaining together a couple m bit multipliers and

some adders, why do you care if the end result is the same?

According to the article they claim somewhere between 1/4-1/2 performance of single

precision and it usually ends up somewhere in the middle. That isn’t very slow.

SSE2 vs. SSE is also 1/2 the performance…

I don’t understand why the cost isn’t fixed - can someone explain what circumstances

cause the speed of the emulation to change?

BonsaiScott · March 28, 2008, 12:56am

Define ‘very slow’. Compared to what? A core 2 Duo? The G92? A Cell? What other device can you purchase today that hits anywhere close to 125 -250 GFLOPS DP and which you can get 2 of them on a board with 1GB of memory for <$400?

Were DP important to me, I’d be using it. It isn’t.

Clearspeed is the only thing I know of and it isn’t in the same $/DPGFLOP ballpark. However, for DPGFLOPS/watt they kick much ass.

So does the Core 2 Duo support DP? By your logic it doesn’t because it isn’t as fast as SP.

Topic		Replies	Views
software implementation of double prec math? CUDA Programming and Performance	5	1887	January 8, 2010
Double Precision how is it exactly? CUDA Programming and Performance	2	1723	July 1, 2011
cuda and double precision CUDA Programming and Performance	3	7829	July 23, 2009
double precision emulation implementing double precision in CUDA CUDA Programming and Performance	4	5348	February 14, 2008
Emulated double precision Double single routine header CUDA Programming and Performance	24	49588	October 18, 2010
references for CUDA + double precision CUDA Programming and Performance	18	37934	May 14, 2008
Do the 9400M and 9600M GT support double precision? CUDA Programming and Performance	7	17880	August 13, 2009
Double precision support in future chips? CUDA Programming and Performance	6	23608	February 21, 2007
how to calculate the double precision data by rtx(or gtx) GPU? CUDA Programming and Performance	11	4207	April 11, 2019
Particle Accelerator Beam Dynamics using CUDA CUDA Programming and Performance	16	7407	October 23, 2007

double emulation any ideas on double precision emulation?

Related topics