Detailed double precision to single precision ration in nVidia GPUs?

jgpallero · December 25, 2013, 10:14pm

Hello,

I’m newbie in GPU computing and I’m a bit confused about the theoretical peak computation for nVifia GPUs. There is some listing (certainly nVidia does not give much information about, at least for non high-end products) where one can read about the theoretical peak performance, always in single precision, as for example: List of Nvidia graphics processing units - Wikipedia

First of all, I’m confused because sometimes the processing power is stated as “Processing power (FMA)”. I understand a Fused Multiply Add (FMA) as the classical AXPY, so actually 1 FMA = 2 FLOPS. Then, in order to compare with classical CPU computations, should I multiply the “Processing Power (FMA)” times 2?

Another question, indeed the most important for me, is the one related to the double precision theoretical peak computation, because I want to do some benchmark using CULA and MAGMA and I would like to calculate the actual performance in comparison with the theoretical peak. I know that there is not the same numbers of DP units as SP units in a GPU, so the DP performance is a factor (<= 1) of the SP performance. Some information about the number of DP units in a GPU can be found in the web, in forums like this, and similar. I have never seen this kind of information in a technical nVidia brochure. The information sometimes is given as a real count of DP units or as a factor related to the number od SP units. For example, the ratio DP/SP for the GTX 460/560 is 1/12 or 1/8 for GTX 470, GTX 480, GTX 570 and GTX 580. So the question is simple, exists any comprehensive list about the DP/SP ratios for nVidia GPUs? Where is it?

I have a GeForce GTX 550 Ti. Should I understand as 1/12 the ratio DP/SP units for this card? I’m alse intended to purchase a mobile workstation equipped with a Quadro K2100M. I’ve looked for information about the DP theoretical peak and the DP/SP ratio but I have not found anything. Does anyone knows the specs about DP for this GPU?

Thanks :)

njuffa · December 25, 2013, 10:48pm

The best information I was able to locate is this overview which however does not list double-precision performance for the K2100M:

[url]Page Not Found | NVIDIA

jgpallero · December 25, 2013, 11:13pm

Thank you very much, njuffa. I already knew that document. It’s a pity there is not information about the K2100M. And I don’t know why: the document comes from nVidia and the theoretical peak should be a simple computation… Also note it does not offer any clock rate

jgpallero · December 26, 2013, 7:44am

Hello,

In some webs, specially in NVIDIA's GeForce GTX 550 Ti: Coming Up Short At $150 one can see some technical details about GPUs, like the ratio DP/SP. Does anyone know where this information is collected?

Using the given information the performance (single precision) for the GTX 550 Ti can be computed as 192 stream processors x 2 FMA operations/cycle x 1.8 GHz shader clock = 691.2 GFLOPS/s (FMA), which is the same performance showed in the table and in List of Nvidia graphics processing units - Wikipedia (fron this ling I got the number 2 FMA operations/cycle for the shaders -called stream processors in the first link I posted-). So I think this number should be multiplied times 2 (each FMA are 2 float point operations) in order to get FLOPS in the sense of 1 FLOP = 1 product or 1 addition. Am I right?

About the double precission, the ratio DP/SP is 1/12, so the peak for double should be 691.2/12 = 57.6 GFLOPS/s (FMA) or 115.2 GFLOPS (1 FMA = 2 FLOPS). Using the DGEMM from CUBLAS I’m obtaining about 45 GFLOPS/s (DGEMM performs 2MN*K FLOPS), which are a relative performance R/Rpeak of about 40% considering the double peak as 115.2. This 40% efficiency is normal in this kind of non high-end GPU (I got this information from the CULA forum).

And about the Quadro K2100M, I could not find any information about the DP/SP ratio. But is a Quadro, which has a better relation DP/SP than GeForce. Anyone could suggest a DP/SP ratio for this GPU?

Thanks

jgpallero · December 29, 2013, 3:21pm

Hello,

Apparently, tha Quadro K2100M is equipped with the GPU core GK106 (Quadro - Wikipedia). Also apparently, the ratio drouble/single for this core is 1/24 (The NVIDIA GeForce GTX 660 Review: GK106 Fills Out The Kepler Family). This means that this Quadro has less performance in double precision that my GeForce GTX 550 Ti… Are right my data?

Thanks

deathly809 · January 2, 2014, 9:54am

I actually cannot find any information on the GK106 architecture on the NVIDIA website. None of their whitepapers mention it for the Kepler architectures. None of the websites that mention it never have any sources that I can tell. But you should be able to check the programming guide for more information. I know Kepler has 192 single-precision cores per SMX and 64 double-precision cores, so it is really 1/3 ratio.

Topic		Replies	Views
CUDA Double Precision Performance 933 GFlops vs 78GFlops CUDA Programming and Performance	17	10032	March 9, 2009
GTX 280 and Tesla 10 DP How much DP peak? CUDA Programming and Performance	8	11470	June 17, 2008
Double Precision on all new Fermis getting to the bottom of DP performance, esp mobile CUDA Programming and Performance	13	9814	October 11, 2010
Double precision: GTX 465, GTX 480 and C2050 CUDA Programming and Performance	16	3788	September 9, 2010
Nvidia Quadro P4000 e P6000 double precision performance CUDA Programming and Performance	2	2913	March 21, 2018
cuda and double-precision floating-point arithmetics CUDA Programming and Performance	3	1890	March 28, 2012
Theoretical SP/DP GFLOPS of Titan Black when DP mode On/Off? CUDA Programming and Performance	17	2662	August 1, 2019
Double precision for mobile Nvidia Mobile GPUs CUDA Programming and Performance	4	1054	July 21, 2011
double precision exceed 500GFLOPS? CUDA Programming and Performance	3	2521	January 3, 2009
Double precision and CUDA CUDA Programming and Performance	9	7816	October 21, 2013

Detailed double precision to single precision ration in nVidia GPUs?

Related topics