Workarounds for IEEE_FMA with NVFortran

Hello,

Is it possible to dictate to the NVFortran compiler that it must use an FMA instruction in certain places, like one would do with ieee_fma? My use case is error accumulation tracking through multiplication like this

x = a*b
err = FMA(a,b,-x)

I tried to compile a program with ieee_fma but it doesn’t seem like the current compiler (25.1) supports it for either the host or device.

program main
    use, intrinsic :: ieee_arithmetic, only : ieee_fma
end

For the host, I know one can create an interface to C and then use C’s fma, but it sounds like that would add a lot of overhead.
For the device, based on this post, inline ptx is not available in cuda fortan.

If you had any suggestions that would be greatly appreciated!

Thank you,
Josh

https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#fortran-device-modules

In CUDA Fortran you have two options:

  1. use cudadevice to access the low level CUDA C functions like __fma_rd

  2. use libm to access all the functions in libm.

The module cudadevice will offer all the rounding modes available in CUDA C.

Thank you for your reply.

It looks like the routines in the cudadevice module will solve my problem on the device.

Is there anything that I could use on the host? Or would you say the best work around is use the FMA from C’s math library?

Thanks again!

I noticed that ieee_fma is supported in flang ( the new compiler infrastructure), let me ask around what the plans are for Nvfortran.

1 Like

You can use libm for both the host and the gpu:

use libm
real a,b,c,d
a = 1.234567
b = 2.345678
c = -4.0
d = fmaf(a,b,c)
print *,d,d-(a*b+c)
end

I am going to file an RFE to add the ieee_arithmetic module.

1 Like