Detailed double precision to single precision ration in nVidia GPUs?


I’m newbie in GPU computing and I’m a bit confused about the theoretical peak computation for nVifia GPUs. There is some listing (certainly nVidia does not give much information about, at least for non high-end products) where one can read about the theoretical peak performance, always in single precision, as for example:

First of all, I’m confused because sometimes the processing power is stated as “Processing power (FMA)”. I understand a Fused Multiply Add (FMA) as the classical AXPY, so actually 1 FMA = 2 FLOPS. Then, in order to compare with classical CPU computations, should I multiply the “Processing Power (FMA)” times 2?

Another question, indeed the most important for me, is the one related to the double precision theoretical peak computation, because I want to do some benchmark using CULA and MAGMA and I would like to calculate the actual performance in comparison with the theoretical peak. I know that there is not the same numbers of DP units as SP units in a GPU, so the DP performance is a factor (<= 1) of the SP performance. Some information about the number of DP units in a GPU can be found in the web, in forums like this, and similar. I have never seen this kind of information in a technical nVidia brochure. The information sometimes is given as a real count of DP units or as a factor related to the number od SP units. For example, the ratio DP/SP for the GTX 460/560 is 1/12 or 1/8 for GTX 470, GTX 480, GTX 570 and GTX 580. So the question is simple, exists any comprehensive list about the DP/SP ratios for nVidia GPUs? Where is it?

I have a GeForce GTX 550 Ti. Should I understand as 1/12 the ratio DP/SP units for this card? I’m alse intended to purchase a mobile workstation equipped with a Quadro K2100M. I’ve looked for information about the DP theoretical peak and the DP/SP ratio but I have not found anything. Does anyone knows the specs about DP for this GPU?

Thanks :)

The best information I was able to locate is this overview which however does not list double-precision performance for the K2100M:

Thank you very much, njuffa. I already knew that document. It’s a pity there is not information about the K2100M. And I don’t know why: the document comes from nVidia and the theoretical peak should be a simple computation… Also note it does not offer any clock rate


In some webs, specially in one can see some technical details about GPUs, like the ratio DP/SP. Does anyone know where this information is collected?

Using the given information the performance (single precision) for the GTX 550 Ti can be computed as 192 stream processors x 2 FMA operations/cycle x 1.8 GHz shader clock = 691.2 GFLOPS/s (FMA), which is the same performance showed in the table and in (fron this ling I got the number 2 FMA operations/cycle for the shaders -called stream processors in the first link I posted-). So I think this number should be multiplied times 2 (each FMA are 2 float point operations) in order to get FLOPS in the sense of 1 FLOP = 1 product or 1 addition. Am I right?

About the double precission, the ratio DP/SP is 1/12, so the peak for double should be 691.2/12 = 57.6 GFLOPS/s (FMA) or 115.2 GFLOPS (1 FMA = 2 FLOPS). Using the DGEMM from CUBLAS I’m obtaining about 45 GFLOPS/s (DGEMM performs 2MN*K FLOPS), which are a relative performance R/Rpeak of about 40% considering the double peak as 115.2. This 40% efficiency is normal in this kind of non high-end GPU (I got this information from the CULA forum).

And about the Quadro K2100M, I could not find any information about the DP/SP ratio. But is a Quadro, which has a better relation DP/SP than GeForce. Anyone could suggest a DP/SP ratio for this GPU?



Apparently, tha Quadro K2100M is equipped with the GPU core GK106 ( Also apparently, the ratio drouble/single for this core is 1/24 ( This means that this Quadro has less performance in double precision that my GeForce GTX 550 Ti… Are right my data?


I actually cannot find any information on the GK106 architecture on the NVIDIA website. None of their whitepapers mention it for the Kepler architectures. None of the websites that mention it never have any sources that I can tell. But you should be able to check the programming guide for more information. I know Kepler has 192 single-precision cores per SMX and 64 double-precision cores, so it is really 1/3 ratio.