Hi everyone,
I’m using CUDA to parallelise a computation which accumulates errors along iterations.
So it’s necessary for me to have a higher precision than 15 digits (double).
I try to use long double but it doesn’t change the precision of the result.
On another side I did the rounds of the key words on google. The leads I have been:
Find an appropriate platform (eg. quadro RTX 4000 GPU)
Use an external lib which builds the double-double data type (31 digits) or quad double…
I’m looking to have a 21 digits precision type basically (i guess less precision can provide better performances). Anyway i need more than double precision type.
Does anyone have a standard way within CUDA to achieve this?
If not, do you have any lib to show me, or resources (which implements mixed-precision for example) ?
I don’t care about performance issues related to the choice of hardware in order to make the subject clear.
Switching to double-double computation is not necessarily advisable; it’s a heavy-handed approach. If you were to go down that path, would you need support for more than basic arithmetic and square root?
What kind of computations are these? Have you been able to root cause the source(s) of numerical error? One major source of numerical issues is typically subtractive cancellation. Another would be error magnification (e.g. using sin() where sinpi() should be used, or using pow() instead of cbrt()).
Floating-point computation is not associative. Have you tried changing the order of floating-point operations to minimize the error, taking maximum advantage of fused multiply-add (FMA)? There is an automated online tool called Herbie for finding the numerically most advantageous way to compute expressions that might be able to assist. The operative word there is “might” because in my experience solutions found by humans are usually superior.
There are techniques for compensated computation of sums, dot products, and polynomials which can be used to address numerical issues at key points in a computation. I linked some of the relevant work on compensated computations in previous posts in these forums: