Hello.
NVIDIA’s GPUs have a single-precision floating-point arithmetic operations unit and a FMAD unit, don’t it?
So, for example,
a single-precision floating-point arithmetic operations unit does a+b,
and a FMAD unit does c*d.
Do these execute at the same time?

At all, does a FMAD unit execute only mutiply or only addition?

An FMAD unit executes ab+c, so it executes multiplication and addition. Each “core” in current Nvidia speak corresponds to one FMAD unit and has a throughput of one FMAD per cycle (it can also do just ab or a+b with one op/cycle). Compute capability 1.x could, in addition to this, execute another multiplication per cycle in the SFU (special function unit), provided it is not register bandwidth starved (which seems to be the case quite often).

Note that the core as a throughput of one FMAD (a*b+c) per cycle, but a latency of 16…24 cycles.