On tackling float-point precision issues in CUDA

Hi,

From reading the CUDA documentation, I have learnt that float-point values in CUDA also follow the IEEE 754 standard. To this end, I would like to ask whether I can tackle float-point precision issues in CUDA the same way I do in C/C++.

Specifically, my questions are as follows.

  1. For float and double types, are all values precisely representatable in C/C++ (including zero) also precisely representable in CUDA?
  2. To compare float-point variables, I use the following code in C/C++.
bool isZero(double d){
     return d >= -DBL_EPSILON && d <= DBL_EPSILON;
}

bool isEqual(double d1, double d2){
     return fabs(d1 - d2) < DBL_EPSILON;
}

Is this code directly transferrable to CUDA?

Thank you!

  1. Yes, and the representation is even the same!
  2. Yes, you can keep using that code on GPUS as well if it has turned out sufficient for your needs in CPU code.

Generally, the fmad() fma() intrinsic (which compiles to a single instruction) is hugely useful. It seems to be not that well known amongst (CPU) programmers, as has only been added to the x86 instruction set relatively recently.

Nvidia has produced a whole whitepaper about Floating Point and IEEE 754.

I am not aware of an fmad() intrinsic in CUDA. I would suggest using the C++ standard functions fma() and fmaf() as needed, as such code should be portable between host and device. Occasionally the specific use of device function intrinsics like __fma_rn() and __fmaf_rn() may be useful.

I concur that educating oneself about the advantages of the fused multiply-add operation is highly recommended in general, as it can be a powerful tool in numerical codes. Knowledge about it is not as widespread among programmers as it should be, given that “all” modern processor architectures (both CPUs and GPUs) support the operation in hardware.

Thanks Norbert for pointing our my typo in style. I don’t think there was a big risk of being misunderstood, as I was already linking to the correctly spelled intrinsic, but I have of course corrected the typo nevertheless.

Your equality check is only really valid for numbers near zero. Your not taking into account the relative size of the values: i.e. you should bee looking at the absolute RELATIVE difference.

You should hence compare the following:

|d1-d2|/max(|d1|,|d2|) < EPSILON , where the relative difference rd = |d1-d2|/max(|d1|,|d2|)

Lets take some examples (fp32).

With your method we would have gotten:

d1=1.234567f
d2=1.234568f
=>
|1.234567 - 1.234568| => 0.000001 < FP32_EPSILON => OK! EQUAL!

These numbers are considered equal, great! Lets multiply them with a large number say 1000.

d1=1234.567f
d2=1234.568f
=>
|1234.567 - 1234.568| => 0.001 > FP32_EPSILON => NOT OK! NOT EQUAL!

So let’s agree that 2 equal numbers multiplied with ex 1000 should still be considered equal, ok? :)

Now lets instead use the suggested definition:

d1=1.234567f
d2=1.234568f
=>
rd = |1.234567 - 1.234568|/max(|1.234567|,|1.234568|) = 0.000001 / 1.234568 = 0.00000081

=> 0.00000081 < FP32_EPSILON => OK! EQUAL!

And now AGAIN multiply by 1000.

d1=1234.567f
d2=1234.568f

=>
rd = |1234.567 - 1234.568|/max(|1234.567|,|1234.568|) = 0.001 / 1234.568 = 0.00000081
(just like before the relative difference is the same)
=> 0.00000081 < FP32_EPSILON => OK! EQUAL!

So just add the additional code to check for the relative difference instead of absolute difference! (and make sure to avoid zero division aswell…).