I extended the sample SDK reduction code to use floats rather than ints. I am seeing some interesting behavior that I’m hoping someone here can clarify.
I tested the GPU results against a CPU version using float and double. For large arrays (~2^21 or over million) the GPU result agrees more closely to the CPU double precision result than the single precision float.
This behavior threw me for a loop since I thought the 8800 only supported single precision floating point?
Gfx card: 8800 GTX
OS: Windows XP
IDE: VisualStudio 2005
The CPU “single” precision and GPU single precision are not exactly the same. CPU floating point registers are 80 bits (unless you’re using SSE) so if the variable you’re accumulating into stays in a register and never gets written out to memory until you’re done adding all the numbers I can see this causing a difference.
32 bits + 32 bits = 32 bits
80 bits + 32 bits = 80 bits
And also the ordering is surely different and floating point addition doesn’t commute.
Probably you’re just getting lucky that it happens to be closer to the double precision result.