I am having an issue with the order of operations in nvfortran when FMAs are enabled.
For example, the following expression
s = a*b + c*d
could be computed in three different ways: with intrinsic add/multiply, or with two different FMAs:
s1 = (a*b) + (c*d)
s2 = FMA(a, b, c*d) ! a* b + (c*d)
s3 = FMA(c, d, a*b) ! (a*b) + c*d
This ambiguity is fine, and is consistent with the Fortran language standard.
However, if the user adds parentheses, then the behavior should be more explicit. For example,
s = (a*b) + (c*d)
should precompute the products before computing the sum. s1
is possible, but the FMA s2
and s3
results are not.
Similarly, if the user writes
s = a*b + (c*d)
then either s1
or the FMA s2
result should be permitted. And
s = (a*b) + c*d
may be either s1
or s3but not
s2.
If I look at Nvidia, this does not appear to be case. I have a test code which computes these expressions for the following values:
a = 1. + epsilon(a)
b = 1. - epsilon(a)
c = 1.
d = -1.
then GNU (using -O2 -mfma
) gives the following results:
a*b + c*d : -4.9303806576313238E-32 (B970000000000000)
c*d + a*b : 0.0000000000000000E+00 (0000000000000000)
(a*b) + (c*d): 0.0000000000000000E+00 (0000000000000000)
(c*d) + (a*b): 0.0000000000000000E+00 (0000000000000000)
a*b + (c*d): -4.9303806576313238E-32 (B970000000000000)
c*d + (a*b): 0.0000000000000000E+00 (0000000000000000)
(a*b) + c*d : 0.0000000000000000E+00 (0000000000000000)
(c*d) + a*b : -4.9303806576313238E-32 (B970000000000000)
which are consistent with the discussion above.
Nvidia 22.7 with -O0 -Mfma
gives the following:
a*b + c*d : 0.0000000000000000E+00 (0000000000000000)
c*d + a*b : -4.9303806576313238E-32 (B970000000000000)
(a*b) + (c*d): 0.0000000000000000E+00 (0000000000000000)
(c*d) + (a*b): -4.9303806576313238E-32 (B970000000000000)
a*b + (c*d): 0.0000000000000000E+00 (0000000000000000)
c*d + (a*b): -4.9303806576313238E-32 (B970000000000000)
(a*b) + c*d : 0.0000000000000000E+00 (0000000000000000)
(c*d) + a*b : -4.9303806576313238E-32 (B970000000000000)
That is, Nvidia always applies FMA, regardless of parentheses. (The FMA order is reversed, but the rules above would permit this in the first two expressions.)
Since Fortran ensures the integrity of parentheses, I would like to believe that this is something which can be resolved on the Nvidia side. Any feedback on your end would be appreciated.