Floating Point Precision of GPU

Hi,

I am using 9800 GT and latest version of driver and CUDA.
I am doing some floating point addition, multiplication, divisions in both CPU and GPU.

But my result vary from that of GPU and CPU:-

here is what the output (fp values) looks like in CPU and GPU:

In CPU:

23.975410 26.500000 27.000000
24.000000 26.434782 27.000000
23.929729 27.000000 27.000000

23.500000 26.212872 27.000000
23.841270 26.000000 27.000000
23.000000 26.235556 27.000000

23.687500 26.000000 26.500000
23.489796 26.000000 26.000000
23.841270 26.000000 27.000000

In GPU:

23.991804 26.500000 27.000000
24.000000 26.478260 27.000000
23.935135 27.000000 27.000000

23.500000 26.202971 27.000000
23.825397 26.000000 27.000000
23.000000 26.231112 27.000000

23.669643 26.000000 26.500000
23.469387 26.000000 26.000000
23.825397 26.000000 27.000000

What are my options to correct this in my 9800 GT?

is double precision supported in my card (a/c to wikipedia seems like it doesnt) ? if not what are my options?

this is normal
you can see the appendix in the programming guide describing the accuracy of floating point numbers on GPUs

this is normal
you can see the appendix in the programming guide describing the accuracy of floating point numbers on GPUs

What makes you sure that they need correcting? Have you compared SSE vs x87 results on the CPU side?

What makes you sure that they need correcting? Have you compared SSE vs x87 results on the CPU side?

If you actually need double precision (think carefully about this), your options are:

  1. Get a new GPU. GTX 200 and 400 series do native double precision. You still probably won’t get the same answer as the CPU due to a number of factors (order of operations change in a parallel calculation, use of extended 80-bit precision in some cases on the CPU, etc). Also be prepared for a performance hit, although a GTX 470 working in double precision might be competitive with a 9800 GT in single precision…

  2. Emulate near double precision using techniques ported from the dsfun90 library. The double-single format used in those algorithms has 48 bits of mantissa instead of 53, like true double precision. This is more than 10x slower than single precision. (Best case is addition at around 11x, everything else is much slower.)

  3. Use a technique like Kahan summation to limit round-off error in long sums. If that isn’t the reason you use double precision, then obviously Kahan summation won’t help. :)

If you actually need double precision (think carefully about this), your options are:

  1. Get a new GPU. GTX 200 and 400 series do native double precision. You still probably won’t get the same answer as the CPU due to a number of factors (order of operations change in a parallel calculation, use of extended 80-bit precision in some cases on the CPU, etc). Also be prepared for a performance hit, although a GTX 470 working in double precision might be competitive with a 9800 GT in single precision…

  2. Emulate near double precision using techniques ported from the dsfun90 library. The double-single format used in those algorithms has 48 bits of mantissa instead of 53, like true double precision. This is more than 10x slower than single precision. (Best case is addition at around 11x, everything else is much slower.)

  3. Use a technique like Kahan summation to limit round-off error in long sums. If that isn’t the reason you use double precision, then obviously Kahan summation won’t help. :)