I’m planning on using CUDA to write scientific molecular dynamics simulations in which both speed and precision are important - hence the need for doubles. I need this on a laptop since I take my computer to and from work every day. According to the CUDA documentation, any non-GeForce GPU with a compute capability of 3.5 should be able to perform 64-bit floating point operations with one third the speed of 32-bit floating point operations. However, the Wikipedia page for NVIDIA Quadro contradicts this, saying that the Quadro K510M and K610M perform double operations at only 1/24 the speed of single precision, despite having a compute capability of 3.5.
So, my question is, which source is correct? If Wikipedia is correct, what’s the best laptop GPU I can use for fast double calculations in CUDA? Are my best options really the Quadro 5000M or 5010M - which are roughly seven years old at this point - as Wikipedia suggests?
Sorry - I meant to say one third rather than one half, post has been edited.
Look at section 5.4.1. Arithmetic Instructions. In the “3.5, 3.7” column, “32-bit floating-point add, multiply, multiply-add” gives 192 Results per Clock Cycle per Multiprocessor, whereas “64-bit floating-point add, multiply, multiply-add” gives 64 results, which is one third. The footnote says that this is only 8 for GeForce GPU’s, but says nothing about Quadro.
I am reasonably certain that no mobile GPU (that is, “M” type) that is supported by the currently shipping CUDA 9.0 supports high-throughput double precision. In general, low power and high DP performance do not mix.
Authoritative statements from NVIDIA on this issue are more than welcome.
Yes, if you have a cc3.5 or cc3.7 GPU, that statement is correct (with the additional GeForce vs. Tesla footnote disclaimer). The primary cc 3.5 Geforce exception I am aware of is devices built around GK208 GPU, which goes under the moniker GT640 and others as well.
That is certainly not saying that there is a given ratio for all GeForce and a given ratio for all Tesla. It is nowhere near that simple. But the documentation is correct (AFAIK), if you care to read it carefully and understand it.
I’m not aware of any Quadro GK208 designs, and anyway Kepler (cc3.x = Kepler) is by now a pretty old GPU. I would not recommend buying any Kepler device today. There are better Maxwell, Pascal, (and Volta for non-mobile) choices, regardless of desired features/pricepoint/performance.
There are no non-Tesla cc3.7 GPUs. That particular chip variant exists only in Tesla K80 clothing.