calculating thereticaly possible flops architecture differences G80/GT200/Fermi

I want to calculate theoretically possible GFLOPS of different Nvidia Cards based of different architectrures.

G92 - GeForce 9800 GTX:

128 (cores) * 2 (MADD) * 1.688 (GHz) = 432 GFLOPS

I also read that G80/G92/GT200 are able to do MADD/ADD in one cycle this would result in

128 (cores) * 3 (MADD/ADD) * 1.688 (GHz) = 648 GFLOPS

Which of the two values is correct?

What is the difference between G80 and G92?
I know that G92 is 60 nm and 90 nm, but are there any conceptional differencens (like between G80 and GT200)?

GT200 - GTX285:
30 (each SM has dedicated DP Unit) * 2 (MADD) * 1.476 (GHz) = 88 GFLOPS (DP)

Fermi - GTX470:
480 / 2 (2 Cores for one DP FP) * 2 * 1.401 = 672 GFLOPS (DP)

Are these calculations right or am i missunderstanding something?

Which percentage of the theoretical GFLOPS can be really achieved (I know that this depends on the algorithm, but I guess there is a certain threshold of what really can be achieved)?

I think, opencl forum is not most appropirate place for such question. Fermi dp performance was reduced btw. And why do you have different frequences for single and double calculations? You also may calculate theoretical global and local memory bandwidth.