Performance of GTX 980 Ti as a General Purpose GPU

I had checked the specs of GTX 980 Ti here >> http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti/specifications

I was wondering if anyone has tested it for general purpose computing & has formed any insight.

My friend and I plan to purchase a GPU for our lab. We have projects in lab on time-series and optimizations (including evolutionary algorithms). We might also have other students who wants to work on Deep Learning techniques. We require general purpose computing (with double precision if possible).

We currently have a budget constraint of 700 USD. We are looking for a GPU that we can buy and insert in the cabinet-slot of a Dell OptiPlex 9020 Desktop computer (business-class, Core i7, Win7 X64, 8 GB RAM).

GTX 980 Ti is the best we could find so far. We hope that it wouldn’t cause any power issues. If you have other suitable suggestions, please provide. I had worked shortly on GPGPUs in 2011 and went out of touch till now. Please forgive any mistakes in understanding of the new technologies.

I couldn’t place this question under other sections. So, I posted here under ‘programming and performance’. Thank you for your understanding and help.

Depends what you need!
For double precision calculations, GeForces in general are not where you want to go.
If you require ECC protected memory and registers, GeForces are not where you want to go.
If you require more than 6GB of memory, you’ll again have to look somewhere else.

Other than that, the 980Ti is one of the best (nvidia) compute cards out there, in my opinion. You get a GM200 GPU with 384bits of memory bus width, which is something that no Tesla card, several times more expensive, has. The step above that would be a Titan X, which as far as I can tell only adds 6GB of video memory and the possibility of using the driver in TCC mode. It however blows your 700USD budget.

For the power concerns, well you need to figure that one out. You have the need for the physical power connectors to be there (I assume one 6 pins and one 8 pins), and you have the need for enough power to be supplied by the power supply in your workstation.

You can do double with the 980, but it does not have double FPU(s) in hardware like a Tesla card, so it will be much slower.

regarding single vs double precision, it is important to consider the following points;

  1. Do you really need double precision?

  2. Nvidia GPUs use FMA operations for 32 computation which provides faster computation and more accurate results. I am working on a Geant4 project and their team concluded that for much of the computation 32 floating point using CUDA was accurate enough, and more accurate than the same 32 bit floating point computation using a CPU.

  3. If there is not much computation in 64 bit and your algorithm pipeline is memory bound, then you should choose the GPU with the most memory bandwidth. For your price level the GTX 980ti or the older GTX 780ti have the highest memory bandwidth at 336 GBs

  4. Many people use “mixed precision” on GPUs like this group;

http://ambermd.org/gpus/benchmarks.htm

Thank you all for giving such important pieces of information. We have decided to go ahead with GTX 980 Ti based on the assumptions below.

a) We can manage with single precision for the intended experiments’ heavy computations.
b) The power can be managed using a suitable power supply unit or physical 6/8 pin connectors.
c) The experiments would stay within 6 GB of memory usage.

We had also checked the performance tests done here >> http://gpuboss.com/gpus/Graphics-cards-best-PassMark-score-5825384

This statement is incorrect. All GPUs with compute capability >= 2.0 (that is, all GPUs supported by CUDA 7.x), provide hardware support for double-precision computation. Various GPUs differ in the relative throughput of single-precision and double-precision operations, based on their target market.

In some cases the double-precision performance of a GPU could be lower than that of a x86 CPU (provided optimized software is used on both). For example, in my machine here I have an sm_50 GPU with ~ 45 GFLOPS of DP performance, while the quad-core CPU provides ~ 50 DP GFLOPS.