FMA order of operations and parentheses

I am having an issue with the order of operations in nvfortran when FMAs are enabled.

For example, the following expression

s = a*b + c*d

could be computed in three different ways: with intrinsic add/multiply, or with two different FMAs:

s1 = (a*b) + (c*d)
s2 = FMA(a, b, c*d)  !  a* b + (c*d)
s3 = FMA(c, d, a*b)  !  (a*b) + c*d

This ambiguity is fine, and is consistent with the Fortran language standard.

However, if the user adds parentheses, then the behavior should be more explicit. For example,

s = (a*b) + (c*d)

should precompute the products before computing the sum. s1 is possible, but the FMA s2 and s3 results are not.

Similarly, if the user writes

s = a*b + (c*d)

then either s1 or the FMA s2 result should be permitted. And

s = (a*b) + c*d

may be either s1 or s3but nots2.

If I look at Nvidia, this does not appear to be case. I have a test code which computes these expressions for the following values:

a = 1. + epsilon(a)
b = 1. - epsilon(a)
c = 1.
d = -1.

then GNU (using -O2 -mfma) gives the following results:

 a*b  +  c*d : -4.9303806576313238E-32 (B970000000000000)
 c*d  +  a*b :  0.0000000000000000E+00 (0000000000000000)
(a*b) + (c*d):  0.0000000000000000E+00 (0000000000000000)
(c*d) + (a*b):  0.0000000000000000E+00 (0000000000000000)
 a*b  + (c*d): -4.9303806576313238E-32 (B970000000000000)
 c*d  + (a*b):  0.0000000000000000E+00 (0000000000000000)
(a*b) +  c*d :  0.0000000000000000E+00 (0000000000000000)
(c*d) +  a*b : -4.9303806576313238E-32 (B970000000000000)

which are consistent with the discussion above.

Nvidia 22.7 with -O0 -Mfma gives the following:

 a*b  +  c*d :  0.0000000000000000E+00 (0000000000000000)
 c*d  +  a*b : -4.9303806576313238E-32 (B970000000000000)
(a*b) + (c*d):  0.0000000000000000E+00 (0000000000000000)
(c*d) + (a*b): -4.9303806576313238E-32 (B970000000000000)
 a*b  + (c*d):  0.0000000000000000E+00 (0000000000000000)
 c*d  + (a*b): -4.9303806576313238E-32 (B970000000000000)
(a*b) +  c*d :  0.0000000000000000E+00 (0000000000000000)
(c*d) +  a*b : -4.9303806576313238E-32 (B970000000000000)

That is, Nvidia always applies FMA, regardless of parentheses. (The FMA order is reversed, but the rules above would permit this in the first two expressions.)

Since Fortran ensures the integrity of parentheses, I would like to believe that this is something which can be resolved on the Nvidia side. Any feedback on your end would be appreciated.

I have a repository demonstrating this effect:

I’ve spoken with some others (including one of the Nvidia engineers) and they agreed that calculations in the parentheses should precede multiplication, even if FMA is enabled. I believe this can be considered a bug in the compiler. Is there a way to report this?

I sent this to another engineer for advice but he hasn’t gotten back to me. Let me ping him again.

Hello, apologies for the repeating pings, but I was checking if you heard back from any compiler engineers on this issue.

Bit reproducibility is an important requirement in our weather/climate operations, and losing FMA would be a big performance loss.

Hi Marshall,

Sincere apologies for not posting a follow-up. I did end-up talking with Dave and after a bit research he did conclude we aren’t honoring parentheses correct here.

I wrote up a report, TPR#34887, and sent it to engineering for investigation.


Thank you Mat, really happy to hear that.

