I am comparing the performance of a compute architecture 1.0 Quadro fx-5600 and an Opteron CPU using a Dot-product operation. The GPU and CPU get different results - only slight but this will build if executing long summations. In order to get results that matched, I set the CPU data input/output to double precision and left the GPU at single precision. I take it that this means that the GPU, at least in this case, has more accuracy with single precision than the CPU with single precision. I have found very little about this, nothing directly mentioning it and given that the Quadro fx-5600 does not have but single precision I assume this must be the artefact of the GPU summation algorithm? I don’t know what is being used for Dot-Products on the GPU - does anyone know? Does the GPU using parallel summations and if so could this account for the better accuracy at single precision?
Thank you for any help/hints.