CUDA kernel converging to CPU double result thought G80 is single precision?

pixelhead · March 6, 2008, 8:19pm

Hi -

I extended the sample SDK reduction code to use floats rather than ints. I am seeing some interesting behavior that I’m hoping someone here can clarify.

I tested the GPU results against a CPU version using float and double. For large arrays (~2^21 or over million) the GPU result agrees more closely to the CPU double precision result than the single precision float.

This behavior threw me for a loop since I thought the 8800 only supported single precision floating point?

Gfx card: 8800 GTX
OS: Windows XP
IDE: VisualStudio 2005

eelsen · March 6, 2008, 8:44pm

The CPU “single” precision and GPU single precision are not exactly the same. CPU floating point registers are 80 bits (unless you’re using SSE) so if the variable you’re accumulating into stays in a register and never gets written out to memory until you’re done adding all the numbers I can see this causing a difference.

GPU:

32 bits + 32 bits = 32 bits

CPU:

80 bits + 32 bits = 80 bits

And also the ordering is surely different and floating point addition doesn’t commute.

Probably you’re just getting lucky that it happens to be closer to the double precision result.

If you want to add up that many numbers in single precision accurately you should use something like the kahan summation algorithm (Kahan summation algorithm - Wikipedia) or multiple buckets.

jimh · March 6, 2008, 10:40pm

Post your CPU single precision implementation. If you’re just adding the floats to a single precision accumulator in a loop, you will not get the correct result.

pixelhead · March 6, 2008, 11:08pm

Hi Jim,

Yup, that’s exactly what I’m doing:

   float *p = new float[ nSize ];

    double dSumCPU = 0;

    float fSumCPU = 0;

    for( int i=0;i<nSize;i++ )

    {

        p[i] = i;

       fSumCPU += p[i];

        dSumCPU += p[i];

    }

What is the correct code?

Thanks very much!

Daren

jimh · March 7, 2008, 12:07am

Use the double precision accumulator. Or use the Kahan algorithm eelsen mentioned.

pixelhead · March 7, 2008, 12:26am

Thanks jim and eelsen! I used the Kahan method and now the results agree to 1e-7.

wandering · March 7, 2008, 8:04pm

just wandering – are you sure the compiler didn’t optimize the Kahan sum back to an ordinary sum?

Topic		Replies	Views
Is there any method to enhance precision? CUDA Programming and Performance	2	4292	January 3, 2009
Floating Point Precision of GPU CUDA Programming and Performance	6	2205	September 9, 2010
floating point precision CUDA Programming and Performance	3	1462	April 10, 2009
Accumulation Difference between CUDA and CPU? CUDA Programming and Performance	7	10467	August 22, 2009
floating point precision on CUDA CUDA Programming and Performance	11	14715	June 8, 2010
Maximum native precision or mixed-precision to have 21 digit floating point type? CUDA Programming and Performance mixed-precision	1	670	January 14, 2022
INACCURACY OF FLOAT DATA TYPE FLOAT DATA TYPE BECOME INACCURATE NEAR ABOUT 2^15 CUDA Programming and Performance	12	2452	July 10, 2010
Reduction Reduction Reduction................. Precision Confusion Race Condition...... HELP! CUDA Programming and Performance	16	10463	December 8, 2009
how to improve float array summation precision and stability? CUDA Programming and Performance	9	3281	January 15, 2019
Floating points CUDA Programming and Performance	3	2060	October 28, 2010

CUDA kernel converging to CPU double result thought G80 is single precision?

Related topics