this is what i found: " Cores perform only single-precision floating-point arithmetics. There is 1 double-precision floating-point unit. "

is this true for all compute capabilities (versions) ?

in NVIDIA CUDA C Programming Guide 4.1 section F.4.1 p.144 is written “32 CUDA cores for integer and floating-point arithmetic operations”. by “floating-point” they mean both single and double?

In CUDA 4.2 programming guide, there is a pointer exactly in this sentence that you cited to the section 5.4.1 - and this section is there in 4.1 programming guide too, detailing throughput for specific arithmetic instructions; so you could see that for example with CC 2.0, double precision operations are twice slower than single precision operations, etc.

Ratio between SP and DP throughput varies wildly depending on the generation of the GPU and if you are using a pro TESLA GPU or a consumer card. ie: GTX 680 have a ratio that is 1/16 (1 DP operation throughput for 16 SP operations throughput), while you may expect a 1/2 ratio on TESLA cards (1 DP per 2 SP throughput).