GTX2xx double precision support

vanja · October 15, 2009, 3:25am

Hi All,

As far as I understand, the GTX2xx series has 1 double precision FPU and 8 single precision FPUs per multiprocessor. I cannot seem to find any documentation of the double precision floating point instructional throughput in the CUDA 2.3 documentation. I am aware of pages 77-79 in the programming guide and pages 43-45 in the best practices guide. These documents list the single precision floating point throughput as 8 operations per clock cycle and state that compute capability 1.3 hardware supports native double precision calculations. The best practices guide alludes to the lower performance of double precision floating point calculations, stating that

However, I cannot find any reference to the actual number of double precision floating point operations per clock cycle.

I have three questions;

[list=1]

[*]Am I correct in assuming that double precision operations are still organised into 32 thread wraps and that one thread from each wrap is processed in each clock cycle? I fail to see how the architecture benefits from being SIMD in double precision if this is the case.

[*]Have I missed the place in the documentation that mentions the double precision floating point performance?

[*]Is it possible to use both the 8 SP FPUs and the DP FPU in parallel capitalising on the extra hardware included?

At any rate, my application is memory bandwidth limited and as such would not benefit from any more double precision FPUs. In fact, the memory bandwidth and arithmetic throughput are better balanced in double precision than in single precision in my case. I am simply interested in learning more about the hardware. I have found this [topic=“70015”]post[/topic], detailing the specifications however I would prefer to have some official reference for use in my thesis!

Thanks in advance…

vanja · October 16, 2009, 2:26am

Hi again,

I’ve found a reference to the double precision floating point throughtput. Page 11 of the Fermi Whitepaper includes this information in a table comparing capabilities of various architectures. It lists the GT200 architecture as being capable of 30 FMA ops/ clock. Interestingly the double precision unit uses fused multiply adds (FMA) while the single precision units use standard MAD operations. The numbers agree with those listed in the post referenced above.

V

Topic		Replies	Views
cuda and double-precision floating-point arithmetics CUDA Programming and Performance	3	1881	March 28, 2012
About instruction throughputs CUDA Programming and Performance	9	5136	May 27, 2010
Does the GTX1060 support double precision? CUDA Programming and Performance	4	11133	February 24, 2017
what is the double-precision flops rating of the gtx580? CUDA Programming and Performance	16	33443	April 10, 2014
Did any tried double precision computation? CUDA Programming and Performance	0	1977	September 9, 2009
Expected performance of double precision arithmetic CUDA Programming and Performance	8	3999	August 20, 2009
Floating Point Precision of GPU CUDA Programming and Performance	6	2205	September 9, 2010
More information about double precision in Guide? CUDA Programming and Performance	4	4041	May 30, 2008
Number of 64 bit floating point operations per clock cycle? CUDA Programming and Performance	2	3839	July 8, 2014
Double precision GFlops of Kepler CUDA Programming and Performance	10	10088	April 7, 2012

GTX2xx double precision support

Related topics