If you actually need double precision (think carefully about this), your options are:

Get a new GPU. GTX 200 and 400 series do native double precision. You still probably won’t get the same answer as the CPU due to a number of factors (order of operations change in a parallel calculation, use of extended 80-bit precision in some cases on the CPU, etc). Also be prepared for a performance hit, although a GTX 470 working in double precision might be competitive with a 9800 GT in single precision…

Emulate near double precision using techniques ported from the dsfun90 library. The double-single format used in those algorithms has 48 bits of mantissa instead of 53, like true double precision. This is more than 10x slower than single precision. (Best case is addition at around 11x, everything else is much slower.)

Use a technique like Kahan summation to limit round-off error in long sums. If that isn’t the reason you use double precision, then obviously Kahan summation won’t help. :)

If you actually need double precision (think carefully about this), your options are:

Get a new GPU. GTX 200 and 400 series do native double precision. You still probably won’t get the same answer as the CPU due to a number of factors (order of operations change in a parallel calculation, use of extended 80-bit precision in some cases on the CPU, etc). Also be prepared for a performance hit, although a GTX 470 working in double precision might be competitive with a 9800 GT in single precision…

Emulate near double precision using techniques ported from the dsfun90 library. The double-single format used in those algorithms has 48 bits of mantissa instead of 53, like true double precision. This is more than 10x slower than single precision. (Best case is addition at around 11x, everything else is much slower.)

Use a technique like Kahan summation to limit round-off error in long sums. If that isn’t the reason you use double precision, then obviously Kahan summation won’t help. :)