2 Small Questions

LightRaven · August 8, 2008, 8:53pm

According to the Programming Guide double type gets demoted to float when compiled to run on device with cp < 1.3. Double type in emulation mode is compiled to run on the host, and stays double?
What I could gather from random information is that each Streaming Multiprocessor contains 1 double precision unit. Making it 30 for the 280 GTX. The unit is capable of issuing 1 double instruction pr cc (correct me if I am wrong as I am very uncertain about this).

What I am wondering is: What happens in a kernel containing

…
a = b+c in double precision
…

when the 8 scalar processor reaches the instruction adding the two operands of double type?

ColinS · August 8, 2008, 9:27pm

I’m not sure about the first question.

For your second question, you are correct that there is only 1 double floating point unit per SMP, bringing the totat count to 30 for the GTX 280. While this may seem unfortunate, it does make sense since these units take up a lot of space. When you have 8 threads which use the double floating point unit, I imagine that these operations will be serialized, which make cause a performance hit. Hopefully you have enough threads and other operations going on that the cost of serialization can be hidden. It might be worth taking a little time to establish what needs to be 64-bit, and what can get by in 32-bit.

LightRaven · August 8, 2008, 11:01pm

I had the same gut feeling, but it would be nice to have it confirmed. Assuming a MAD-instruction stays 1 cc in double precision you get a peak ~78 gigaflops/s in double precision. If they get serialized hopefully it does not induce any overhead. What’s the going rate on the cpu’s. I believe the CELL had around 20 gigaflop sustained (from the shaky back of my head).

LightRaven · August 9, 2008, 7:55pm

Any with a 1.3 card that might know the anwser(s) ?

Topic		Replies	Views
Number of double precison units CUDA Programming and Performance	3	4250	September 21, 2009
Double/transendental architecture behavior CUDA Programming and Performance	14	7009	December 22, 2008
clock cycles of double operation CUDA Programming and Performance	9	5230	April 23, 2009
GTX2xx double precision support CUDA Programming and Performance	1	2014	October 16, 2009
GTX 280, CUDA and Double Precision CUDA Programming and Performance	15	16963	July 17, 2008
A question on single and double precision performance calculation with CUDA cores CUDA Programming and Performance	7	2506	May 31, 2024
Number of double precision units and architecture in CUDA Geforce GT 650M 2 GB CUDA Programming and Performance	4	1408	October 29, 2013
CUDA Double Precision Performance 933 GFlops vs 78GFlops CUDA Programming and Performance	17	10193	March 9, 2009
double precision on the GTX 280 CUDA Programming and Performance	2	5240	August 13, 2008
cuda and double-precision floating-point arithmetics CUDA Programming and Performance	3	1944	March 28, 2012

2 Small Questions

Related topics