How many float operations per cycle?

In CUDA 2.1 FAQ the peak computation rate accessible from CUDA of NVIDIA Tesla C1060 Processor is calculated simply by (240 * 3 * 1.296)Gflops
240 means number of processor cores, 1.296 means clock rate for each processor core, and 3 means logically number of float operations per clock?
So, is 3 float operations per clock the same for all processor type e.g. also Quadro FX 1700 in CUDA architecture?

3 is from the fact that G200 can do a MAD and a MUL in the same clock, thus 3 FLOPS.

Hardware generations earlier than G200 can technically perform the same 3 ops per clock, but for some (debatable) reason are unable to do so (see the many previous threads on this topic for the discussion, there is no need to repeat it here). They can however, easily perform a MAD every clock so the factor is 2 for compute 1.0 and 1.1 hardware.…st&p=428607

Can the MUL unit do anything else? Can G200 do 2 ADDs per cycle, or 2 MOVs?