Today I was demonstrating CUDA to a potential customer. He was impressed with the speedups but he was worried about the precision…

He was even worried about differnces of the order 10^-3. He says even 10^-8 is not tolerable for him. This customer transacts in billions and a small change can reflect in the order of millions… and that requires explanation…

In this context, I would like to know if TESLA has any improvement in precision.

Are there any plans for NVIDIA to be completely compliant in float arithmetic? – atleast in TESLA series…

Do you ever read the news and/or the specifications in the programming guide? The GT200 GPU (in GTX 260, 280 and Tesla 1000 series) supports fully IEEE compliant double precision arithmetic.

If I were a customer with billions on the line, I wouldn’t trust any floating point arithmetic at all, but that is only my opinion. Fixed point with sufficient precision or a full-blown infinite precision (within memory constraints) math is the only way to guarantee that kind of accuracy for any calculation you would want to do on the data.

I would suggest using IEESARNATH precision. Whenever 32bit FP arithmetic proves too inaccurate for the transactions involved, you round down the difference and put it in your bank account.

I believe there’s a special function for this in 2.1 Final.

CUDA is close to people’s heart that people start reacting emotionally when some1 points out a potential limitation. Now, Relax… Read on before hitting that reply button quoting superman and spiderman…

If you people have read my post carefully, I had also mentioned about inaccuracies in MATH functions which can cause issues.

Agreed that everything is IEEEwhatever compliant. But the programming guide has a separate deviations section explaining what is not upto the standard. Look @ what the guide has to say about “division”. In its own words: “Division is implemented via the reciprocal in a non-standard compliant way” Similar case for square-root as well. Now, whether this causes deviations in the results in NOT mentioned clearly in the manual.

Apart from that, there are some error levels in math functions. I am not a floating point expert to pass judgement on this error levels. But my question would be – Will this cause deviations from the CPU returned results?

At the end of day, I am getting deviations in results. And, That could potentially affect our business… Thats what I am concerned about. Thats why I am looking @ this forum’s knowledge for help.

If you have some genuine answers, I would love to hear them.

Actually that was a very clever oblique reference to the movie Superman 3, where the plot included making money from roundoff errors on banking transactions.

Actually Mr. Anderson and alex were completely correct and were indeed answering your question.

Float math has well understood accuracy, and the deviations are indeed documented in the manual. But you’re right to worry about how that may affect you.

But the CUDA DOUBLE precision math is fully IEEE compliant! 0 units of error in the last place for the fundamental operations. Full denormal support. NaN and signed zero, everything. That’s all CPUs give you too.

As Mr. Anderson says, it’s conceivable that some applications need even more than double precision, and need to implement their own fixed point or large integer libraries. This is true on the CPU as well.

In practice, if you’re dealing with money, the 23 bits of single floating point precision isn’t enough for millions of dollars or more. Fixed point can get you to a billion or so. But double precision, with 53 bits, is going to be lossless precision up to about 90 quadrillion dollars.

Sorry, I can never pass up a good Superman 3 joke.

As far as errors in math libraries go, what SPWorley said is basically correct:

Single precision is not fully IEEE compliant, especially in transcendentals and division. Check the programming guide for exact numbers. I am not a floating point expert either, so I don’t know how badly that’s going to mess things up for you there.

Double precision is IEEE compliant. If you do double precision on the host (that is to say, 64-bit double precision, not 80-bit extended x87 double precision), you should see identical results regardless of what functions you use.

A lot of guys in finance have been resistant to even considering using single precision. You might want to check out the work of Mike Giles on the subject.

Actually, IEEE compliance is a bit of, shall I say, bullcrap.

The cases when floating point precision truly matters, IEEE compliance is irrelevant. Such as in this situation. Even if single precision were IEEE compliant, that would do nothing to alleviate the real concern that 32bit floating point numbers have far less accuracy than needed for financial bookkeeping.

IEEE compliance is an exercise in anal-retention and poor debugging procedure. IEEE, by definition, is not concerned with calculations being right or wrong, but only that they’re all wrong the same way. This is self-evidently silly. (Useful only as a sanity check in a few cases.) Yet ironically, IEEE doesn’t even accomplish this goal because compiler differences change order of operations and make strict reproducibility across architectures impossible.

Now, that’s not to say numerical precision isn’t a real issue. It is a tremendous one. And the unfortunate truth is that, as MrAnderson pointed out, even doubles may be insufficient! I’m hope you’re aware that numbers as simple as 0.1 cannot be represented in floating-point with complete accuracy. Binary fractions cannot represent decimal fractions exactly. (They can only represent combinations of 1/2, 1/4, 1/8, etc.) This may not be a fatal issue, since the differences become vanishingly small, but it’s real. This is why some programmers choose to use “decimal” representations (which use many more calculations and are not hardware accelerated).

OTOH, I don’t think anyone would let a CUDA system anywhere near real money! (For many reasons.) If you are simply doing analysis or buy/sell decisions or other sorts of algorithms that aren’t directly involved with counting real money, then you have much lower requirements. (It’s OK to order to buy 0.001% too much or too little of a stock, as long as the actual transactions are accounted precisely.) Make sure your customer understands the requirements of his specific application. Single precision may be acceptable.

P.S. When will CUDA expose the add-with-carry instruction? It is critical for the implementation of many sorts of arbitrary-precision math.

FP is not commutative. In a single-threaded CPU implementation without any compiler optimisations, your math might actually get executed exactly in the same order as you code it. CUDA does this in parallel, so the results are not guaranteed to be identical. Simply consider a parallel reduction to compute the norm of (1,2,…,12345). Depending on how threads and thread blocks are set up, you get completely different intermediate sums.

Yep, exactly. Maybe I was too short and subtle in my description.

It doesn’t matter if you can accurately represent a quadrillion dollars in double precision, depending on the calculations you do, the final result of your calculation can be absolute trash, especially if you take the difference of two similar numbers. Even simple high school formulas like the solution to the quadratic equation can go horribly wrong (see the course notes from a course I once took: http://www.cs.mtu.edu/~shene/COURSES/cs362…iew/reals.html).

When using floating point arithmetic, these are just problems that every programmer must be aware of to be avoided, but unfortunately many aren’t aware of it as evidenced on these forums when the topic comes up every few months.

To expand on what Dominik said, if you want the best (i.e. most accurate) results from the GPU (or CPU) calculations, make sure that you take the time to research some numerically stable algorithms for your code. Often times, the simple methods that quickly come to mind when coding can amplify round-off errors (caused by fixed length floating point representations of numbers) – look up “Kahan Summation” for a simple example. To further complicate matters, numerically stable algorithms designed to run in a serial fashion on the CPU often become unstable algorithms when adapted to parallel calculations on the GPU – so you’ll want to find algorithms designed specifically for parallel computations.

There are some decent parallel algorithms books out there that you might want to take a look at if you’re really concerned about the accuracy of your results.