Double precision and CUDA

dan_the_coder · October 20, 2013, 7:30pm

Hi all,

On the NVIDIA web site it is repeatedly stated that double precision is supported as of compute level 1.3. However, I can’t figure out what exactly “supported” means. I have read on other forums that only a fraction of the GPU’s resources support double precision operations. Can someone explain this to me?

To put this question into context: I’m in the market for a graphics card that will speed up the experiments I’m running as part of my graduate studies. The experiments involve a lot of matrix operations where all the values are of type double. Right now it takes about 2 days to run a single experiment and I’m hoping to trim that down. It’s not something that would be running constantly, more like once or twice a week. I was looking at the GTX 660 ti and heard it’s not very good for double precision operations but again I don’t know what “not very good” means.

Thanks,
Dan

CudaaduC · October 20, 2013, 8:00pm

Because GPUs were originally created for gaming, they really did not need 64 bit capability. GPGPU computing took off after 2008, and double precision became more in demand so Nvidia created a line ‘Tesla’ which had better double precision performance.

The 660 is not a great choice for double precision, as it is a gaming GPU. My GTX 680 32 bit performance is at about 2.2 Teraflops, while the double precision is at about 170 Gigaflops.

The Tesla K20 (or the Titan which is cheaper and has similar performance) the 32 bit performance is at about 2.6 Teraflops, while the double precision is at about 1.4 Teraflops.

So if you are stuck with the 660 try using 32 bit floats, or get a Titan or Tesla K20.

dan_the_coder · October 20, 2013, 8:05pm

Thanks for the response. I haven’t purchased it yet so I’m not stuck just yet. Unfortunately, the Titan and Tesla cards are out of my price range. Where does the Quadro line fit into all this?

Keeping in mind I don’t need the absolute best, I just need something that would speed things up from a two day process to hopefully under a day.

dan_the_coder · October 20, 2013, 8:09pm

Also, where do you get the performance numbers? I can’t seem to find them in the specs.

CudaaduC · October 20, 2013, 8:21pm

[url]http://cuda-z.sourceforge.net/[/url]

That utility will give you the current performance of any attached CUDA enabled GPU.

I do believe the CUDA-Z numbers are a bit understated, as other tests I have run give better numbers, but it will be in the ballpark.

Oh, and the Quadro(s) are not really meant for calculations, so you would be better off with the 660.

If you do not already have the 660, the 680 numbers I mentioned above will be close. The 660 will be slightly worse.

dan_the_coder · October 20, 2013, 8:27pm

I guess my only option for the budget is a GTX then. The only question is if it will actually make a difference over a CPU-only solution. Is there any way to figure that out?

CudaaduC · October 20, 2013, 8:44pm

It really depends on exactly the type of calculations. cuBLAS or MAGMA great for dense matrices, and even on a GTX 660 will be much faster than an single-thread CPU implementation.

Will you be comparing to MATLAB or your own CPU implementation?

Is casting to floats out of the question?

dan_the_coder · October 20, 2013, 9:38pm

It’s my own CPU implementation of a machine learning algorithm which involves statistical modeling and inference so precision is kind of important.

dan_the_coder · October 21, 2013, 1:43am

For anyone else that’s interested there is some information about the double-precision performance of the GTX 700 series GPUs here:

pasoleatis · October 21, 2013, 8:39am

It depends on the problems how much impact it has to change the precision. Some codes are memory bounded which means that changing the precision will only have an impact of 2, while other are compute bounded and they will be slower by a factor bigger than 2. In my computer I have both 660 Ti and a Titan card. The difference between them when running a cufft based code was about 3-4 times, while the theoretical double precision performance is bigger in Titan by a factor of 10 at least.

Topic		Replies	Views
Performance of GTX 980 Ti as a General Purpose GPU CUDA Programming and Performance	5	4143	September 29, 2015
Double precision throughput on GTX's CUDA Programming and Performance	2	3512	August 12, 2011
GT 240 and double precision CUDA Programming and Performance	4	15078	February 8, 2011
double precision on mobile GPU CUDA Programming and Performance	17	7968	October 30, 2011
Double precision performance CUDA Programming and Performance	5	5637	May 22, 2011
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14854	December 4, 2012
Double precision support in future chips? CUDA Programming and Performance	6	23505	February 21, 2007
GTX 280, CUDA and Double Precision CUDA Programming and Performance	15	16807	July 17, 2008
Expected performance of double precision arithmetic CUDA Programming and Performance	8	3999	August 20, 2009
double precision and GeForce card capable of double prec calcs? CUDA Programming and Performance	4	14438	June 28, 2011

Double precision and CUDA

Related topics