Double precision and CUDA

Hi all,

On the NVIDIA web site it is repeatedly stated that double precision is supported as of compute level 1.3. However, I can’t figure out what exactly “supported” means. I have read on other forums that only a fraction of the GPU’s resources support double precision operations. Can someone explain this to me?

To put this question into context: I’m in the market for a graphics card that will speed up the experiments I’m running as part of my graduate studies. The experiments involve a lot of matrix operations where all the values are of type double. Right now it takes about 2 days to run a single experiment and I’m hoping to trim that down. It’s not something that would be running constantly, more like once or twice a week. I was looking at the GTX 660 ti and heard it’s not very good for double precision operations but again I don’t know what “not very good” means.


Because GPUs were originally created for gaming, they really did not need 64 bit capability. GPGPU computing took off after 2008, and double precision became more in demand so Nvidia created a line ‘Tesla’ which had better double precision performance.

The 660 is not a great choice for double precision, as it is a gaming GPU. My GTX 680 32 bit performance is at about 2.2 Teraflops, while the double precision is at about 170 Gigaflops.

The Tesla K20 (or the Titan which is cheaper and has similar performance) the 32 bit performance is at about 2.6 Teraflops, while the double precision is at about 1.4 Teraflops.

So if you are stuck with the 660 try using 32 bit floats, or get a Titan or Tesla K20.

Thanks for the response. I haven’t purchased it yet so I’m not stuck just yet. Unfortunately, the Titan and Tesla cards are out of my price range. Where does the Quadro line fit into all this?

Keeping in mind I don’t need the absolute best, I just need something that would speed things up from a two day process to hopefully under a day.

Also, where do you get the performance numbers? I can’t seem to find them in the specs.

That utility will give you the current performance of any attached CUDA enabled GPU.

I do believe the CUDA-Z numbers are a bit understated, as other tests I have run give better numbers, but it will be in the ballpark.

Oh, and the Quadro(s) are not really meant for calculations, so you would be better off with the 660.

If you do not already have the 660, the 680 numbers I mentioned above will be close. The 660 will be slightly worse.

I guess my only option for the budget is a GTX then. The only question is if it will actually make a difference over a CPU-only solution. Is there any way to figure that out?

It really depends on exactly the type of calculations. cuBLAS or MAGMA great for dense matrices, and even on a GTX 660 will be much faster than an single-thread CPU implementation.

Will you be comparing to MATLAB or your own CPU implementation?

Is casting to floats out of the question?

It’s my own CPU implementation of a machine learning algorithm which involves statistical modeling and inference so precision is kind of important.

For anyone else that’s interested there is some information about the double-precision performance of the GTX 700 series GPUs here:

It depends on the problems how much impact it has to change the precision. Some codes are memory bounded which means that changing the precision will only have an impact of 2, while other are compute bounded and they will be slower by a factor bigger than 2. In my computer I have both 660 Ti and a Titan card. The difference between them when running a cufft based code was about 3-4 times, while the theoretical double precision performance is bigger in Titan by a factor of 10 at least.