Critique my HPC Specification

I just got a quote for an HPC with two Tesla K80. I will primarily use the computer to develop proprietary software in CUDA/C++/Matlab for reservoir simulation and for AI. The specs are below. What should I change about the current configuration to make the system more versatile?

(2) Intel Xeon E5-2620v3 2.4Ghz Hexa-Core CPUs / Hyperthreading / Turbo 128GB DDR4-2133 ECCR DIMM Memory (8x16GB)
(8) Open DIMMs
(2) Mirrored 240GB 6Gb/s SATA Solid State Hard Drives
(2) Mirrored 2TB 6Gb/s SATA 7200RPM Hard Drives
(2) 24GB GDDR5 Tesla K80 GPUs,
(1) QLogic QLE3442-RJ Dual RJ-45 10Gbe NIC Internal 24X DVD-RW SATA Optical Drive
Microsoft Windows 8.1 Professional Operating System

This may of interest:

http://ambermd.org/gpus/benchmarks.htm

Do you really need the double precision? In those benchmarks 2 GTX Titan X GPUs often beat two Tesla K80 GPUs (which are 2 GPUs in each so one could argue the two GTX Titan X GPUs in some cases beat 4 Tesla GPUs).

as an even more general question, do you need a k80?

Thanks for the information

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti

If you don’t need 12GB of GDDR5 memory to load very large datasets over 6GB, GTX 980 Ti 6GB GDDR5 is a cheaper option over the TITAN X. Both cards are slower though with FP64 vs the older gen GK110/210 GPUs.

How much slower for FP64?

The ratio of DP units to SP units on GM20x devices is 1:32.

And on “older gen GK110/210 GPUs”?

No software DP?

CUDA has never offered emulation of double-precision arithmetic. Prior to sm_13, there was no support for double-precision computation, the compiler would simply demote computation to single precision (which was not terribly useful beyond fairly trivial code). Since sm_13, all GPUs have supported double precision in hardware, with throughput an architecture-dependent fraction of the single-precision throughput.

I think one would want to consider two aspects here: (1) Many reservoir simulation codes can be run using single-precision computation throughout without noticeable degradation of quality of results (2) Where double-precision computation is required, even a high-end Maxwell part may underperform a hefty Xeon such as the model specified by the OP (this assumes the equivalent CPU code is mostly vectorized for AVX2, and all CPU cores are being used).

Note that (based on what I learned recently in another thread) Matlab will use double precision data by default, so simply outsourcing computationally intensive kernels to the GPU may provide much less of the desired speedup when Maxwell-class GPUs are used.