What to buy now for CUDA calculations?

I may be looking to buy a GPU exclusively for CUDA calculations (running Scientific Linux 6.5 on a AMD based machine). I currently have a GTX275 (very old, was cheap) that works fine to run a display, but not sure it will be fast enough, or is still supported.

If you were buying a GPU today, what would you get? Between the GTX 970 and GTX 980, I’d lean towards the 970 on budget, but would I be better off getting a 780 Ti? Is anyone else using the 970 or 980 for CUDA? Is it better to have Maxwell for future compatibility?

Regards,
Martin

This list is fairly accurate in terms of performance for CUDA applications:

http://www.videocardbenchmark.net/high_end_gpus.html

If the question is deciding between the GTX 970 and the GTX 780ti, the GTX 780ti is more powerful and versatile.

Choosing between between the GTX 980 and the GTX 780ti will depend on your intended CUDA applications. The GTX 780ti has less memory (3GB) but significantly greater memory bandwidth and better 64 bit performance.

Does this also apply for double complex calculations?

yes, even more so.

If you really need 64 bit performance, the Titan Black or Tesla K40 is the best bet, but for under $600 the GTX 780ti works.

GTX 780 Ti: 2880 CUDA cores x 876 MHz base clock x 2 FP ops/FMA = 5045 single-precision GFLOPS. Double-precision instructions execute at 1/24 the rate of single-precision instructions, resulting in performance of 210 double-precision GFLOPS. Theoretical memory bandwidth is 336 GB/sec.

GTX 980: 2048 CUDA cores x 1126 MHz base clock x 2 FP ops/FMA = 4612 single-precision GFLOPS. Double-precision instructions execute at 1/32 the rate of single-precision instructions, resulting in performance of 144 double-precision GFLOPS. Theoretical memory bandwidth is 224 GB/sec.

GTX Titan Black: 2880 CUDA cores x 889 MHz x 2 FP ops/FMA = 5121 single-precision GFLOPS. In default mode, the Titan Black runs double-precision instructions at 1/24 the throughput of single-precision instructions, i.e. 213 double-precision GFLOPS. It also has a “full double-precision” mode in which double-precision instructions run at 1/3 the throughput of single-precision instructions. However, I seem to recall that the clock rate is reduced when running in “full double-precision” mode. Various internet sources seem to indicate a double-precision throughput of 1300 GFLOPS for the GTX Titan Black. Maybe a forum user with a GTX Titan Black can confirm this (or correct this if necessary). Theoretical memory bandwidth is 336 GB/sec.

NVIDIA’s official hardware specifications can be found here:

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780-ti/specifications
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980/specifications
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications

Can anyone confirm the GTX Titan Black double precision performance?