fmaf Looking for information about fmaf ??

gilles.champagne · November 15, 2010, 10:10pm

I try to find some information on fmaf math function…
But it seem that CUDA Toolkit documentation do not provide any information about how to use that function… except the name of the function and parameters…
But what each parameter mean ???
Pretty strange ???

Maby a basic question… but not really easy to find info in the documentation…
Any idea

gilles.champagne · November 15, 2010, 10:10pm

I try to find some information on fmaf math function…
But it seem that CUDA Toolkit documentation do not provide any information about how to use that function… except the name of the function and parameters…
But what each parameter mean ???
Pretty strange ???

Maby a basic question… but not really easy to find info in the documentation…
Any idea

seibert · November 15, 2010, 10:17pm

It’s actually a standard C math function representing a fused-multiply add for single precision floating point numbers. You can find a manpage for it on most any Mac or Linux system, and Google can also find you a manpage on the web.

seibert · November 15, 2010, 10:17pm

It’s actually a standard C math function representing a fused-multiply add for single precision floating point numbers. You can find a manpage for it on most any Mac or Linux system, and Google can also find you a manpage on the web.

njuffa · November 15, 2010, 10:24pm

fmaf() is one of the C99 standard math functions, a single-precision fused-multiply add. The CUDA documentation does not provide documentation for standard C99 math functions at this time. Online man pages for these functions can be located with an internet search engine.

fmaf(a,b,c) computes a*b+c with a single rounding, i.e. the unrounded, double-wide product of a and b participates in the addition with c, and the result of the addition is rounded according to the IEEE rounding mode round-to-nearest-or-even.

CUDA also offers device functions (i.e. intrinsics) that apply one of the four IEEE-754 rounding modes to the single-precision fused multiply-add operation. They are: __fmaf_rn(), __fmaf_rz(), __fmaf_ru(), __fmaf_rd().

For sm_1x platforms, fmaf() and the corresponding device functions are implemented via software emulation. For sm_2x they are supported natively by the hardware.

njuffa · November 15, 2010, 10:24pm

fmaf() is one of the C99 standard math functions, a single-precision fused-multiply add. The CUDA documentation does not provide documentation for standard C99 math functions at this time. Online man pages for these functions can be located with an internet search engine.

fmaf(a,b,c) computes a*b+c with a single rounding, i.e. the unrounded, double-wide product of a and b participates in the addition with c, and the result of the addition is rounded according to the IEEE rounding mode round-to-nearest-or-even.

CUDA also offers device functions (i.e. intrinsics) that apply one of the four IEEE-754 rounding modes to the single-precision fused multiply-add operation. They are: __fmaf_rn(), __fmaf_rz(), __fmaf_ru(), __fmaf_rd().

For sm_1x platforms, fmaf() and the corresponding device functions are implemented via software emulation. For sm_2x they are supported natively by the hardware.

gilles.champagne · November 15, 2010, 10:53pm

fmaf() is one of the C99 standard math functions, a single-precision fused-multiply add. The CUDA documentation does not provide documentation for standard C99 math functions at this time. Online man pages for these functions can be located with an internet search engine.

fmaf(a,b,c) computes a*b+c with a single rounding, i.e. the unrounded, double-wide product of a and b participates in the addition with c, and the result of the addition is rounded according to the IEEE rounding mode round-to-nearest-or-even.

CUDA also offers device functions (i.e. intrinsics) that apply one of the four IEEE-754 rounding modes to the single-precision fused multiply-add operation. They are: __fmaf_rn(), __fmaf_rz(), __fmaf_ru(), __fmaf_rd().

For sm_1x platforms, fmaf() and the corresponding device functions are implemented via software emulation. For sm_2x they are supported natively by the hardware.

OK thak’s guy for your help… I appricate it…

So I understand that in my device code, with a sm_1x, fmaf() use emulation. So it’s gone be slower than juste doing x+=a*b; …(that’s what my result show to me, it’s 3 time slower).

But in a sm_2x, it’s gone be faster to use fmaf() or __fmaf_rn() than juste doing x+=a*b;…

I am right about it ???

gilles.champagne · November 15, 2010, 10:53pm

fmaf() is one of the C99 standard math functions, a single-precision fused-multiply add. The CUDA documentation does not provide documentation for standard C99 math functions at this time. Online man pages for these functions can be located with an internet search engine.

fmaf(a,b,c) computes a*b+c with a single rounding, i.e. the unrounded, double-wide product of a and b participates in the addition with c, and the result of the addition is rounded according to the IEEE rounding mode round-to-nearest-or-even.

CUDA also offers device functions (i.e. intrinsics) that apply one of the four IEEE-754 rounding modes to the single-precision fused multiply-add operation. They are: __fmaf_rn(), __fmaf_rz(), __fmaf_ru(), __fmaf_rd().

For sm_1x platforms, fmaf() and the corresponding device functions are implemented via software emulation. For sm_2x they are supported natively by the hardware.

OK thak’s guy for your help… I appricate it…

So I understand that in my device code, with a sm_1x, fmaf() use emulation. So it’s gone be slower than juste doing x+=a*b; …(that’s what my result show to me, it’s 3 time slower).

But in a sm_2x, it’s gone be faster to use fmaf() or __fmaf_rn() than juste doing x+=a*b;…

I am right about it ???

njuffa · November 16, 2010, 1:45am

The software emulation for fmaf() on sm_1x platforms is quite a bit slower than the code generated for ab+c. On sm_2x both idioms have the same speed, provided ab+c gets optimized by the compiler into an FFMA (single-precision fused multiply-add) instruction. This happens frequently, but not always. If you need to be sure (for example if your algorithm depends on the numerical properties of a fused multiply-add) call fmaf() or the equivalent device function __fmaf_rn() directly where the presence of an FMA is required.

njuffa · November 16, 2010, 1:45am

The software emulation for fmaf() on sm_1x platforms is quite a bit slower than the code generated for ab+c. On sm_2x both idioms have the same speed, provided ab+c gets optimized by the compiler into an FFMA (single-precision fused multiply-add) instruction. This happens frequently, but not always. If you need to be sure (for example if your algorithm depends on the numerical properties of a fused multiply-add) call fmaf() or the equivalent device function __fmaf_rn() directly where the presence of an FMA is required.

gilles.champagne · November 16, 2010, 3:24pm

Thank’s.

It’s all clear now.

gilles.champagne · November 16, 2010, 3:24pm

Thank’s.

It’s all clear now.

Topic		Replies	Views
Several questions on cuda (arithmetic, rounding, for loop ad performance) CUDA Programming and Performance	8	3531	April 13, 2023
FMA precision issue CUDA Programming and Performance	9	19362	November 21, 2010
instruction or operation CUDA Programming and Performance	16	3241	March 28, 2019
Strange behavior of cosf function (possible bug ?) CUDA Programming and Performance	13	2144	March 6, 2013
Emulated double precision Double single routine header CUDA Programming and Performance	24	49162	October 18, 2010
Float precision error in matrix multiplication application. CUDA Programming and Performance	14	3580	February 27, 2014
Examining the generated .ptx file CUDA Programming and Performance	13	2397	October 24, 2014
output difference between quadro K600 and K620 CUDA Programming and Performance	13	4892	December 2, 2014
Different results with Quadro FX 4600 and Quadro 4000? CUDA Programming and Performance	6	2492	July 13, 2011
Are double precision functions in CUDA MATH API only the copy-paste version of single precision func CUDA Programming and Performance	4	1955	June 28, 2014

fmaf Looking for information about fmaf ??

Related topics