CUDA Double Precision with log() and exp() INEXACT RESULTS Lack of precision in Log and Exp function

Good morning,
i have migrated matlab functions to cuda via mexfiles, but have the current precision problem.
The line itself that is causing the problem is the following:
matA[idx]=exp(log(matA[idx]) / (1- vector[idx+1] / vector[idx] ) );
Consecutive positions of vector are almost equal, thus i get an almost divide by zero in de division of the log.
The rest of the functions perform perfectly, as they have no ulp error, but in double precision both log and exp functions suffer from a ulp error of 1 (full range).
This lack of precision is inadmissable for my calculations, so i would like to know:

¿IS THERE ANY WAY TO GET EXACT RESULTS WITH LOG AND EXP FUNCTIONS?I dont have any problems if it takes longer, but i need the precision.

Anyone who can help, PLEASE ANSWER.
Thanks to anyone in advance!!!
David Lisin

Are you doing this in single or double precision on the GPU?

All calculations are performed in double precision. Hope this helps!!!

To my understanding, it’s impossible to have “fully accurate” log and exp because the table maker’s dilemma. Thus having a 1ulp error between different implementations (such as GPU vs CPU) does not necessarily tell you which one is more accurate.

The problem is that my application has to be as accurate as Matlab, and I know that the final client wont acccept something like: We have different results, but “maybe” gpu is more accurate than CPU. I need the same results to assure GPU implementation, and although an error of 1e-14 or 1e-15 has very little variance, its still a different result.

Has anyone had this experience with Matlab CUDA implementations, and been able to correct it?

Thanks again for the interest!

If you really need to be completely the same as Matlab, I think the only way is to “simulate” Matlab’s algorithm for log/exp.

I don’t know how Matlab compute log/exp internally. If it uses x87’s instruction then it completely depends on the CPU implementation, and could make things complicated. Otherwise, if it has its own algorithm for computing log/exp (it’s possible as many RISC CPU and SSE/SSE2 instructions do not have log/exp instructions), then you may be able to implement the same algorithms in your CUDA kernel. This should be able to make them producing the same results as the CPU version.

Thank you for a very interesting answer. I shall look right now for the matlab implementation of log, exp. If anyone has any further information in respect to this, or the original matlab code, i would be very greatful. Thanks in advance, and again, thank you for the interesting answer!

Honestly, it might be worth your time to try to educate your customer about the inherent limitations of floating point precision. Trying to meet a design spec which requires that you get exactly the same answer as Matlab seems like a waste of development resources. Different CPU implementations of log() could easily have the same difference you see between CPU and GPU.

If MATLAB’s log is calculated internally by the x87 FPU, then you’re probably in deep, deep trouble.The x87 internally performs everything in 80-bit precision (higher even than double). I believe that this can be a notorious source of heisenbugs in code, since attempting to examine a calculation mid-way will cause values to be flushed out of the FPU, and truncated to their programmatic precision (be that single or double). As a result, simply looking at a variable will change its value (since when the value is put back into the FPU, the extra bits will be padded, not restored to their previous state). I don’t know your clients, but unless the rest of the code is written with fanatical attention to IEEE precision issues (simple test: do your clients know that a+(b+c) is not equal to (a+B)+c in floating point arithmetic?), then this code has bigger problems than this particular blow up. In fact, I’d argue that it’s probably helpful, since it brings the problems to light.

This is very true, although it is always easier to defend a product if you obtain the same results. I have been investigating, and it seems that the precision in CUDA is actually better than the precision of Matlab results, but even so I am going to have to justify that the actual results are more precise. Even though it sounds outrageous, it is easier to defend that you obtain the same results, than to defend that the results are different, because they are more precise :) Thanks for the answer

Here’s another interesting question for you to consider: does the Windows version of Matlab produce exactly the same results as the Linux version?

And also: do 32-bit and 64-bit Matlab produce the same answer? Some compilers use x87 instructions on 32-bit x86 processors and SSE instructions (only using one of the vector slots) on 64-bit x86 processors by default.

If you have access to a compiler, such as Intel Fortran, you could generate the results using quad-precision code and then use the results as a reference for both Matlab and Cuda.

MMB

OK, sorry for taking so long to reply, but I had the weekend in between.
Currently the version of Matlab that im executing is in Windows in 32 bit, but in a few days I will install 64 bit windows, and a Linux version, so I Will be a ble to show results :)
The goo news is as follows:
I have executed the funciont f=exp(log(a number that tends to one) / a number that tends to zero);
And in matlab, the division is not taken into acount, so that the end result where “a number that tends to zero” differs, does not change. This implies that Matlab is having troubles with precision.
Thus I prepared 2 functions, one to calculate the epsilon of Matlab, and another to calculate the epsilon of CUDA. And voilá, the results are as follows:
Matlab
Epsilon of Matlab: 1.1102e-16
Acumulative Error of Matlab: 6.5503e-13
CUDA
Epsilon of CUDA: 1.1102e-16
Acumulative Error of CUDA: 6.5503e-13
So both versions have the same epsilon, and yet Matlab does not register the changes and cuda does :(
Once I have installed 64 bit versions, I shall post any updates. Thanks to everyone for responding :)

The suggestion of making reference calculations in higher precision arithmetic is a very good one. Einstein is reported to have once said that “A man with a watch always knows exactly what the time is, but a man with two watches can never be sure”, and that is going to be your problem here, I think. If you use a quad precision library (the are a number of very good, arbitrary precision FP libraries around for C and Fortran), then you can compute a reference function table you can use to arbitrate on the relative performance of each implementation.

Wow… What a convoluted way of writing a simple math expression. External Media

Would your client complain if you significantly improve the accuracy of both the matlab and CUDA code by rewriting it like this?

matA[idx] = pow(matA[idx], vector[idx] / (vector[idx] - vector[idx + 1]));

The real problem is not with exp or log, but with the division. As the result is close to 1, it has an error of half an ulp of 1 (which is 2^(-54)). Subtracting 1 from the rounded result causes a cancellation, which greatly amplifies the error.

(This is not specific to computer floating-point, the same thing happens with pen-and-paper calculations.)

If you cancel early by starting with the subtraction instead, the error of the division will not get amplified. The cancelling subtraction is exact in any case.

Then the available C library functions should be used whenever possible instead of combining functions and amplifying intermediate errors…

excuse me, but… if the denominator is “almost divide by 0”, it means it (log(matA[idx]) / (1- vector[idx+1] / vector[idx] ) ) is a VERY large number.

exp(VERY LARGE NUMBER) is overflow

the limits for exp() are 39log(10) in single and 309log(10) in double… What are the numbers involved?

ah ok… “log(a number that tends to one)” - sorry I didn’t notice it in one of the following posts. So it is going to be close to 0/0.