Performance of Tesla vs Quadro vs GeForce Titan

Please help me decide on a new GPU based on best runtime performance as there are so many choices. (Tesla, Quadro, GeForce). My current fluid dynamics simulation CUDA app runs on a much older Quadro K5000 on Linux OS.

I have a need for real-time simulation, double precision computation, ability to save large data structures in GPU memory (upwards of 10GB), and need to scale up to multi-GPU system without much code changes.

Being a student I am also working with a small budget. Any suggestions?

Additionally, will a system with many smaller memory GPUs be able to handle large data structures using Unified Memory or do I need one GPU with large memory?

Can you quantify

(1) The percentage of double-precision computation (out of all FP computation) required for your use case?
(2) What approximate dollar amount your small budget equates to?

  1. About 80% of total computation is in double-precision.

  2. I am looking for a solution within 5 grands. Although this constraint is flexible if I can get more than 10x run-time improvement with double-precision. That is why I was considering multiple smaller memory footprint GPUs.

It seems the GeForce Titan V would fit the bill, based on the following overview:

I have never used a GeForce Titan V, and I am not sure they are still available. Generally speaking, a requirement for high double-precision throughput excludes almost all GeForce branded GPUs, many Quadro branded GPUs, and some Tesla branded GPUs.

Also generally speaking, the utility of one large, powerful GPU is preferable to multiple less powerful ones.

Thank you for the link and useful feedback. This is very helpful.

Although I have not looked into cloud based GPU farms, if one large powerful GPU is preferred then what sort of hardware view an application gets on a GPU farm. Does it reserve a single GPU? And what might be performance difference if an app designed for single GPU is then run on a GPU farm. Something I will have to investigate but if anyone has suggestions, it is much appreciated. Thank you.

A quick perusal of available AWS instance types suggests that their P2 (K80-based) and P3 (V100-based) instances are suitable for CUDA-accelerated DP-intensive applications:

I have no personal experience with AWS. No idea why the forum software expanded the link I provided into a mini-ad for AWS.