Is it possible to dictate to the NVFortran compiler that it must use an FMA instruction in certain places, like one would do with ieee_fma? My use case is error accumulation tracking through multiplication like this
x = a*b
err = FMA(a,b,-x)
I tried to compile a program with ieee_fma but it doesn’t seem like the current compiler (25.1) supports it for either the host or device.
program main
use, intrinsic :: ieee_arithmetic, only : ieee_fma
end
For the host, I know one can create an interface to C and then use C’s fma, but it sounds like that would add a lot of overhead.
For the device, based on this post, inline ptx is not available in cuda fortan.
If you had any suggestions that would be greatly appreciated!