fmaf() is one of the C99 standard math functions, a single-precision fused-multiply add. The CUDA documentation does not provide documentation for standard C99 math functions at this time. Online man pages for these functions can be located with an internet search engine.

fmaf(a,b,c) computes a*b+c with a single rounding, i.e. the unrounded, double-wide product of a and b participates in the addition with c, and the result of the addition is rounded according to the IEEE rounding mode round-to-nearest-or-even.

CUDA also offers device functions (i.e. intrinsics) that apply one of the four IEEE-754 rounding modes to the single-precision fused multiply-add operation. They are: __fmaf_rn(), __fmaf_rz(), __fmaf_ru(), __fmaf_rd().

For sm_1x platforms, fmaf() and the corresponding device functions are implemented via software emulation. For sm_2x they are supported natively by the hardware.