Math Bug in Tesla T10 Is this a bug in addition?


The following Kernel seems to run into trouble on the Tesla T10 GPU:



#define COUNT 256

global void findSum(numType *cu_result)


numType temp=0;

for (int i=0;i<COUNT;i++)


    temp +=80189;






When numType is “float”, output is: 80188.8

When numType is “double” output is: -2.22507e-308

In both cases I call the kernel in the following manner:


numType *cu_result;

cudaMalloc((void**)&cu_result, sizeof(numType));



So there is just 1 block with 1 thread.

Am I doing something wrong?



I’ll bet you’re not compiling with the -arch sm_13 flag to enable double precision.

Indeed I was not! This takes care of the case where “numType” is “double”.

But the single precision problem still exists. I guess this really is a bug

in the hardware??

This seems to a single precision issue rather than a CUDA specific
issue. Never mind…

Nope, your Tesla (and your floats) are working just like they should be.

A float can’t hold integers larger than 2^24 exactly, so your sum has floating point truncation losses.

If you do the same accumulation loop on the CPU, you’ll get the identical 80188.8 result as the Tesla.

Floating point seems so simple, but it’s sometimes fiendish! A long but useful guide:…=

You can almost summarize it by saying “be careful adding numbers of different magnitudes, and don’t trust the difference between two numbers of the same magnitude.”

In this case, you’re doing the former by adding values like 80189.0 to numbers that are two orders of magnitude larger, like 20528330.0.


Thanks for the link.