double emulation any ideas on double precision emulation?

we need huge sparse matrices -based Linear equations solver with double precision. Anybody know perhaps somebody did that already? didn’t want to re-invent the wheel.

The GPU only provides single precision float right? So I don’t see the point of emulating (via CUDA) double precision because you cannot use it on the GPU

No current GPU hardware supports double-precision, and it’s difficult to emulate it in software efficiently. However, you can use tricks like so-called “double single” to improve precision by combining two singles:

The Mandelbrot sample in the SDK uses this.

Otherwise, you just need to wait a while for our DP hardware to be released.

This paper implements sparse matrices using segmented scan:…_pub?pub_id=915

That is not correct. No current hardware supports double-precision.

Well, to be precise, one currently cannot do double-precision on any GPU available for purchase.


The RV670, which has been available for purchase for ~4 months, supports double precision in hardware.

You mean the hardware that would be released late last year? :devil: (Sorry I couldn’t help myself because I am having deadlines in the near future ;))

Is there any new info on when that would be available that you can tell in public?

It does not support native double precision fp64 but does 2 fp32 this is very slow

Check this out

Understand me good. I don’t hate ATI (I have ATI’s at home myself 2) but this is something to be ashamed of… Telling you support DP FP64 which isn’t DP FP64…

So please don’t tell ATI supports DP FP because they don’t.

Interesting article. Thanks for posting it.

That does bring up an interesting point: When NVidia hardware does support double precision, will they support double precision math using single precision emulation like ATI is currently doing? That would be useful for code compatibility. Also we have no idea how fast NVidia’s double precision math will be compared to single precision but I would guess it will be a better ratio than ATI is able to achieve with emulation.

BTW NVidia never did announce when their double precision hardware would become available. I think a lot of us assumed it would be released around October or November last year because that’s when NVidia typically releases a hardware upgrade. In retrospect I realize that this was just wishfull thinking on our part. I think that NVidia is trying to do something similar to Intel with a two stage hardware upgrade cycle. The first stage is a new architecture (the 8800) and the second stage is a die shrink with minimal new features (the 9800). I am already crossing my fingers that this fall will bring the next generation of NVidia hardware with double precision and a lot of other goodies.

-Mark Granger

Does it really matter how it is implemented in the silicon? If you want a 2m bit multiplier

and the hardware designer does it by chaining together a couple m bit multipliers and

some adders, why do you care if the end result is the same?

According to the article they claim somewhere between 1/4-1/2 performance of single

precision and it usually ends up somewhere in the middle. That isn’t very slow.

SSE2 vs. SSE is also 1/2 the performance…

I don’t understand why the cost isn’t fixed - can someone explain what circumstances

cause the speed of the emulation to change?

Define ‘very slow’. Compared to what? A core 2 Duo? The G92? A Cell? What other device can you purchase today that hits anywhere close to 125 -250 GFLOPS DP and which you can get 2 of them on a board with 1GB of memory for <$400?

Were DP important to me, I’d be using it. It isn’t.

Clearspeed is the only thing I know of and it isn’t in the same $/DPGFLOP ballpark. However, for DPGFLOPS/watt they kick much ass.

So does the Core 2 Duo support DP? By your logic it doesn’t because it isn’t as fast as SP.