New here. I am writing code on GPU. Recently, I found that the sum of two float/double numbers with opposite signs is not zero, such as X+(-X) != 0. When I use the __dadd_rz or __dadd_rn instead of the + operator, the sum can become zero.
I know that the compiler will treat the + operator more aggressively than the non-default rounding mode. And no rounding modifiers may be optimized to use fused-multiply-add instructions on the target device. In my opinion, FMA is more accurate than add operation. But actually, it is not right. Can anyone help to tell me the reason?
When I do X+(-X) I get zero. You should provide an actual complete example.
In analogy to “Pics or it didn’t happen”: Without buildable and runnable example code that reproduces the issue I claim this never happened as described :-)
In all likelihood the two numbers in question do not have the same magnitude and opposite sign. Maybe mathematics suggest that they should be of identical magnitude, but floating-point computation is not mathematics.
Other than for FMA-contraction, which can be turned off with
-fmad=false, the CUDA compiler does not perform aggressive re-association of floating-point arithmetic expressions. A normal ‘+’ operator can be affected by FMA contraction,
__dadd_r*() with any rounding mode cannot.
Because FMA comprises a multiply and a dependent addition, but uses only a single rounding at the end and uses the full product as an input to the addition (which can guard against certain cases of subtractive cancellation) it will on average deliver more accurate results than the equivalent discrete operation sequence.