According to the Programming Guide double type gets demoted to float when compiled to run on device with cp < 1.3. Double type in emulation mode is compiled to run on the host, and stays double?
What I could gather from random information is that each Streaming Multiprocessor contains 1 double precision unit. Making it 30 for the 280 GTX. The unit is capable of issuing 1 double instruction pr cc (correct me if I am wrong as I am very uncertain about this).
What I am wondering is: What happens in a kernel containing
a = b+c in double precision
when the 8 scalar processor reaches the instruction adding the two operands of double type?