Which kind of realisation of c = a*b is used in toolkit 3.1 (computation capability 2.0) ?
Is id multiply-add (MAD) or fused multiply add (FMAD) ?
Y.
Which kind of realisation of c = a*b is used in toolkit 3.1 (computation capability 2.0) ?
Is id multiply-add (MAD) or fused multiply add (FMAD) ?
Y.
Devices of compute capability 2.0 (Fermi) have a Fused multiply-add in hardware.
The compiler may use it when you write a*b+c, but this is not guaranteed.
If you want to generate an FMA, you can use the fmaf() / fma() functions.
It also works on compute 1.x devices, but is dreadfully slow in single-precision due to lack of hardware support.
Devices of compute capability 2.0 (Fermi) have a Fused multiply-add in hardware.
The compiler may use it when you write a*b+c, but this is not guaranteed.
If you want to generate an FMA, you can use the fmaf() / fma() functions.
It also works on compute 1.x devices, but is dreadfully slow in single-precision due to lack of hardware support.