I’m currently wondering about the instruction throughput table at section 5.4.1 of the CUDA Programming Guide V4.0
Devices of CC 2.0 are supposed to have a 4x higher double precision throughput than CC 2.1 ?
The GTX 480 has a specified DP throughput of 168.1 GFlops.
15 SM * 1,401 Ghz * 2 (FMA) * X = 168.1 GFlops X = 3.999 → ~ 4
The GTX 580 has a specified DP throughput of 179.6 GFlops.
16 SM * 1,544 Ghz * 2 (FMA) * X = 197.6 GFlops X = 3.999 → ~ 4
Maybe DP Throughput is the same for both CC 2.0 and CC 2.1 ?
are there more faults in the manual like these? Sorry, but i have to rely on these values to calculate my performance metrics! Is there a more reliable and/or a more extended table available?