Rounding Errors

Hi All!

I’m new to CUDA and to get warmed up I have been playing with some example code that uses multi-dimensional arrays. The original code used integers, I changed these to floats so that I could play around with basic maths functions. If I run the code below I get some weird rounding. Each number in the array should have a value that ends in .55, and when printed to 1 decimal place should round up to .6. However when I actually run the code, the first group of numbers rounds to .6 - the following 3 groups round to .5 - and then the remaining groups round to .6.

Can anyone explain why this is? Am I doing something wrong?

The program is running on a Quadro FX 570 (Compute 1.1), windows vista (32bit), and compiled in MSVC++ 2010 express.

Here is the code:

#include <stdlib.h>
#include <stdio.h>

global void kernel(float *array)
{
int index_x = blockIdx.x * blockDim.x + threadIdx.x;
int index_y = blockIdx.y * blockDim.y + threadIdx.y;

// map the two 2D indices to a single linear, 1D index
int grid_width = gridDim.x * blockDim.x;
int index = index_y * grid_width + index_x;

// map the two 2D block indices to a single linear, 1D block index
float result = blockIdx.y * gridDim.x + blockIdx.x + 0.55; // each value will end in .55

// write out the result
array[index] = result;
}

int main(void)
{
int num_elements_x = 16;
int num_elements_y = 16;

int num_bytes = num_elements_x * num_elements_y * sizeof(int);

float *device_array = 0;
float *host_array = 0;

// allocate memory in either space
host_array = (float*)malloc(num_bytes);
cudaMalloc((void**)&device_array, num_bytes);

// create two dimensional 4x4 thread blocks
dim3 block_size;
block_size.x = 4;
block_size.y = 4;

// configure a two dimensional grid as well
dim3 grid_size;
grid_size.x = num_elements_x / block_size.x;
grid_size.y = num_elements_y / block_size.y;

// grid_size & block_size are passed as arguments to the triple chevrons as usual
kernel<<<grid_size,block_size>>>(device_array);

// download and inspect the result on the host:
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);

// print out the result element by element
for(int row = 0; row < num_elements_y; ++row)
{
for(int col = 0; col < num_elements_x; ++col)
{
printf("%2.1f ", host_array[row * num_elements_x + col]);
}
printf("\n");
}
printf("\n");

// deallocate memory
free(host_array);
cudaFree(device_array);

getchar();

}

What you are seeing is the expected behavior for floating point, and has nothing to do with the GPU. Because floating point numbers are represented in a base-2 form, some numbers that can be represented with a finite number of decimal places require an infinite repeating representation in binary. Interestingly, it looks like any number of the form [x + 0.55, where x is a natural number] has this problem. As a result, the addition you perform in your kernel produces a floating point number that is near the true answer, but can be above or below, depending on the rounding mode. This results in an instability in the output when you only print 1 decimal place.

Below I’ve computed x + 0.55 in single precision with Python and printed them with 8 decimal places. You can see a region in the middle where the 1 decimal place rounding would change.

[   0.55000001,    1.54999995,    2.54999995,    3.54999995,
          4.55000019,    5.55000019,    6.55000019,    7.55000019,
          8.55000019,    9.55000019,   10.55000019,   11.55000019,
         12.55000019,   13.55000019,   14.55000019,   15.55000019,
         16.54999924,   17.54999924,   18.54999924,   19.54999924,
         20.54999924,   21.54999924,   22.54999924,   23.54999924,
         24.54999924,   25.54999924,   26.54999924,   27.54999924,
         28.54999924,   29.54999924,   30.54999924,   31.54999924,
         32.54999924,   33.54999924,   34.54999924,   35.54999924,
         36.54999924,   37.54999924,   38.54999924,   39.54999924,
         40.54999924,   41.54999924,   42.54999924,   43.54999924,
         44.54999924,   45.54999924,   46.54999924,   47.54999924,
         48.54999924,   49.54999924,   50.54999924,   51.54999924,
         52.54999924,   53.54999924,   54.54999924,   55.54999924,
         56.54999924,   57.54999924,   58.54999924,   59.54999924,
         60.54999924,   61.54999924,   62.54999924,   63.54999924,
         64.55000305,   65.55000305,   66.55000305,   67.55000305,
         68.55000305,   69.55000305,   70.55000305,   71.55000305,
         72.55000305,   73.55000305,   74.55000305,   75.55000305,
         76.55000305,   77.55000305,   78.55000305,   79.55000305,
         80.55000305,   81.55000305,   82.55000305,   83.55000305,
         84.55000305,   85.55000305,   86.55000305,   87.55000305,
         88.55000305,   89.55000305,   90.55000305,   91.55000305,
         92.55000305,   93.55000305,   94.55000305,   95.55000305,
         96.55000305,   97.55000305,   98.55000305,   99.55000305,
        100.55000305,  101.55000305,  102.55000305,  103.55000305,
        104.55000305,  105.55000305,  106.55000305,  107.55000305,
        108.55000305,  109.55000305,  110.55000305,  111.55000305,
        112.55000305,  113.55000305,  114.55000305,  115.55000305,
        116.55000305,  117.55000305,  118.55000305,  119.55000305,
        120.55000305,  121.55000305,  122.55000305,  123.55000305,
        124.55000305,  125.55000305,  126.55000305,  127.55000305,
        128.55000305,  129.55000305,  130.55000305,  131.55000305,
        132.55000305,  133.55000305,  134.55000305,  135.55000305,
        136.55000305,  137.55000305,  138.55000305,  139.55000305,
        140.55000305,  141.55000305,  142.55000305,  143.55000305,
        144.55000305,  145.55000305,  146.55000305,  147.55000305,
        148.55000305,  149.55000305,  150.55000305,  151.55000305,
        152.55000305,  153.55000305,  154.55000305,  155.55000305,
        156.55000305,  157.55000305,  158.55000305,  159.55000305,
        160.55000305,  161.55000305,  162.55000305,  163.55000305,
        164.55000305,  165.55000305,  166.55000305,  167.55000305,
        168.55000305,  169.55000305,  170.55000305,  171.55000305,
        172.55000305,  173.55000305,  174.55000305,  175.55000305,
        176.55000305,  177.55000305,  178.55000305,  179.55000305,
        180.55000305,  181.55000305,  182.55000305,  183.55000305,
        184.55000305,  185.55000305,  186.55000305,  187.55000305,
        188.55000305,  189.55000305,  190.55000305,  191.55000305,
        192.55000305,  193.55000305,  194.55000305,  195.55000305,
        196.55000305,  197.55000305,  198.55000305,  199.55000305,
        200.55000305,  201.55000305,  202.55000305,  203.55000305,
        204.55000305,  205.55000305,  206.55000305,  207.55000305,
        208.55000305,  209.55000305,  210.55000305,  211.55000305,
        212.55000305,  213.55000305,  214.55000305,  215.55000305,
        216.55000305,  217.55000305,  218.55000305,  219.55000305,
        220.55000305,  221.55000305,  222.55000305,  223.55000305,
        224.55000305,  225.55000305,  226.55000305,  227.55000305,
        228.55000305,  229.55000305,  230.55000305,  231.55000305,
        232.55000305,  233.55000305,  234.55000305,  235.55000305,
        236.55000305,  237.55000305,  238.55000305,  239.55000305,
        240.55000305,  241.55000305,  242.55000305,  243.55000305,
        244.55000305,  245.55000305,  246.55000305,  247.55000305,
        248.55000305,  249.55000305,  250.55000305,  251.55000305,
        252.55000305,  253.55000305,  254.55000305,  255.55000305]

As a side note, this is why you should never do banking calculations with floating point numbers. You don’t want to be off by a penny after processing thousands of 55 cent transactions. :)

Thanks Seibert - That’s great! its not me doing something wrong!

I think my understanding of floating point numbers has been only basic until now - I just wikipedia’d floating point numbers and I now believe I understand the basics… better!

I’ll keep reading on about these strange things they call floating point numbers!

Thanks a lot!!

Although a little heavy, this is probably the most complete reference:

“What Every Computer Scientist Should Know About Floating-Point Arithmetic”
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

You may also want to read NVIDIA’s whitepaper, which references Goldberg’s seminal paper as well as other relevant literature:

http://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf

Thanks all - I had no idea that floating point numbers are so interesting… It turns out I had always thought of them as being fixed point, not abstractions of the number. its heavy going but I’m getting my head round it.

Thanks for all the help, I’m pleased I ran into this now, not later when working on something important!

Here is an addendum to Goldberg’s paper that I forgot to mention in my earlier post:

http://www.validgh.com/goldberg/addendum.html

For computations on x86 platforms, you might find the following whitepaper of use:

http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler