Single Precision Accuracy of Dot-Product


I am comparing the performance of a compute architecture 1.0 Quadro fx-5600 and an Opteron CPU using a Dot-product operation. The GPU and CPU get different results - only slight but this will build if executing long summations. In order to get results that matched, I set the CPU data input/output to double precision and left the GPU at single precision. I take it that this means that the GPU, at least in this case, has more accuracy with single precision than the CPU with single precision. I have found very little about this, nothing directly mentioning it and given that the Quadro fx-5600 does not have but single precision I assume this must be the artefact of the GPU summation algorithm? I don’t know what is being used for Dot-Products on the GPU - does anyone know? Does the GPU using parallel summations and if so could this account for the better accuracy at single precision?

Thank you for any help/hints.

The cpu does summation in long precision internally and then returns the result in single precision. For summation single precision is not enough for large numbers. In practice (A+B)+C is not equal to A+(B+C).

@pasoleatis thank you for the reply. The information about the CPU is very useful to understand the differences. However, I still don’t understand how the CPU can return an answer that is closer to the GPU answer when the CPU is double precision and the GPU is using single precision. The answers are close with CPU at float but are closer match when CPU is double. Do you know if Nvidia Quadro fx-5600 use an algorithm to maintain better accuracy with long summations? I ran this against more than one type of CPU (dual core and a 6-core) and got the same results.

Thank you again.