cuda and double-precision floating-point arithmetics

this is what i found: " Cores perform only single-precision floating-point arithmetics. There is 1 double-precision floating-point unit. "

is this true for all compute capabilities (versions) ?

in NVIDIA CUDA C Programming Guide 4.1 section F.4.1 p.144 is written “32 CUDA cores for integer and floating-point arithmetic operations”. by “floating-point” they mean both single and double?

In CUDA 4.2 programming guide, there is a pointer exactly in this sentence that you cited to the section 5.4.1 - and this section is there in 4.1 programming guide too, detailing throughput for specific arithmetic instructions; so you could see that for example with CC 2.0, double precision operations are twice slower than single precision operations, etc.

Great :)

Thank you

Ratio between SP and DP throughput varies wildly depending on the generation of the GPU and if you are using a pro TESLA GPU or a consumer card. ie: GTX 680 have a ratio that is 1/16 (1 DP operation throughput for 16 SP operations throughput), while you may expect a 1/2 ratio on TESLA cards (1 DP per 2 SP throughput).