Instruction throughput table

I’m currently wondering about the instruction throughput table at section 5.4.1 of the CUDA Programming Guide V4.0

Devices of CC 2.0 are supposed to have a 4x higher double precision throughput than CC 2.1 ?

The GTX 480 has a specified DP throughput of 168.1 GFlops.

15 SM * 1,401 Ghz * 2 (FMA) * X = 168.1 GFlops X = 3.999 → ~ 4

The GTX 580 has a specified DP throughput of 179.6 GFlops.

16 SM * 1,544 Ghz * 2 (FMA) * X = 197.6 GFlops X = 3.999 → ~ 4

Maybe DP Throughput is the same for both CC 2.0 and CC 2.1 ?

are there more faults in the manual like these? Sorry, but i have to rely on these values to calculate my performance metrics! Is there a more reliable and/or a more extended table available?