From reading the CUDA documentation, I have learnt that float-point values in CUDA also follow the IEEE 754 standard. To this end, I would like to ask whether I can tackle float-point precision issues in CUDA the same way I do in C/C++.
Specifically, my questions are as follows.
For float and double types, are all values precisely representatable in C/C++ (including zero) also precisely representable in CUDA?
To compare float-point variables, I use the following code in C/C++.
Yes, you can keep using that code on GPUS as well if it has turned out sufficient for your needs in CPU code.
Generally, the fmad()fma() intrinsic (which compiles to a single instruction) is hugely useful. It seems to be not that well known amongst (CPU) programmers, as has only been added to the x86 instruction set relatively recently.
I am not aware of an fmad() intrinsic in CUDA. I would suggest using the C++ standard functions fma() and fmaf() as needed, as such code should be portable between host and device. Occasionally the specific use of device function intrinsics like __fma_rn() and __fmaf_rn() may be useful.
I concur that educating oneself about the advantages of the fused multiply-add operation is highly recommended in general, as it can be a powerful tool in numerical codes. Knowledge about it is not as widespread among programmers as it should be, given that “all” modern processor architectures (both CPUs and GPUs) support the operation in hardware.
Thanks Norbert for pointing our my typo in style. I don’t think there was a big risk of being misunderstood, as I was already linking to the correctly spelled intrinsic, but I have of course corrected the typo nevertheless.
Your equality check is only really valid for numbers near zero. Your not taking into account the relative size of the values: i.e. you should bee looking at the absolute RELATIVE difference.
You should hence compare the following:
|d1-d2|/max(|d1|,|d2|) < EPSILON , where the relative difference rd = |d1-d2|/max(|d1|,|d2|)