Precision Problem

Hi,

I’m currently porting a matlab code to Cuda which produces a matrix as the final result. I have noticed that values at the random places of the matrix returned from the GPU does not match the ones produced by the matlab code. Tracking it down step by step I found out that the values are not matching at random places after a simple scaling of the matrix. The places where it’s not matching changes from one run to another.

A sample of mis-matching values are given below

[codebox]

Values not matching [246] (0.0005635147, -0.0009828182), (0.0005635147, -0.0009828183)

Values not matching [263] (-0.0010207375, -0.0000709876), (-0.0010207376, -0.0000709876)

Values not matching [272] (-0.0004542154, 0.0011594702), (-0.0004542154, 0.0011594703)

Values not matching [290] (0.0001093576, -0.0010442695), (0.0001093576, -0.0010442697)

Values not matching [238] (-0.0010028849, -0.0004090143), (-0.0010028851, -0.0004090144)

Values not matching [255] (-0.0002247403, 0.0010913066), (-0.0002247403, 0.0010913067)

Values not matching [186] (-0.0003326887, -0.0011114202), (-0.0003326887, -0.0011114203)

Values not matching [203] (0.0009803898, -0.0004338518), (0.0009803899, -0.0004338518)

Values not matching [246] (0.0011994267, 0.0002392598), (0.0011994268, 0.0002392599)

Values not matching [203] (-0.0004072963, -0.0012231999), (-0.0004072963, -0.0012232000)

Values not matching [177] (-0.0011118277, 0.0005138259), (-0.0011118278, 0.0005138259)

Values not matching [263] (0.0009061802, 0.0009969573), (0.0009061803, 0.0009969574)

[/codebox]

I’m working on Tesla C1060 board. How do I get rid of this or it’s a problem with the floating point math in GPU?

Thanks

Shibdas

As you have C1060 , you tried using double precision computations on the GPU ( you will get more number of digits matching as currently I see only 10 digits) ?

Also floating point results won’t match EXACTLY some times as due to difference in (GPU-CPU) hardware and parallel floating point computation…

-> A = b + c + d is not equal to A = c + d and then A = a + b (IN THE FLOATING POINT WORLD)

so in the GPU threads maybe not following the same computation order as in the CPU code.

THIS DOES NOT MEAN THAT CPU CALCULATIONS ARE MORE ACCURATE THAN GPU… its just something a programmer has to deal with while working with floating point computations

hope this helps…

To be fair, the CPU calculations are probably more accurate than the GPU. Double precision is used much more frequently on the CPU because compilers cast up to double precision if there is any mixed precision, all of the standard transcendental functions are implement in double, people use double reflexively in their CPU code, and compilers often store intermediate results in 80-bit floating point registers, regardless of the precision of the variables in your code.

The real issue here is that floating point is imperfect due to finite precision, regardless of whether you use 32, 64, 80, or 128 bit floats. So you have to ask the question: How accurate of a result is sufficient for my application? If the answer is somewhere around 10^-6 fractional error (or larger), then careful use of single precision arithmetic will achieve this in many cases.

(Long sums of numbers and subtraction of two nearly equal numbers are the usual precision killers. Mitigating errors introduced by these operations might require Kahan summation, use of double precision for small parts of the calculation, or switching to a different algorithm entirely.)