MAD or FMAD

Which kind of realisation of c = a*b is used in toolkit 3.1 (computation capability 2.0) ?

Is id multiply-add (MAD) or fused multiply add (FMAD) ?

Y.

Devices of compute capability 2.0 (Fermi) have a Fused multiply-add in hardware.

The compiler may use it when you write a*b+c, but this is not guaranteed.

If you want to generate an FMA, you can use the fmaf() / fma() functions.

It also works on compute 1.x devices, but is dreadfully slow in single-precision due to lack of hardware support.

Devices of compute capability 2.0 (Fermi) have a Fused multiply-add in hardware.

The compiler may use it when you write a*b+c, but this is not guaranteed.

If you want to generate an FMA, you can use the fmaf() / fma() functions.

It also works on compute 1.x devices, but is dreadfully slow in single-precision due to lack of hardware support.