Advice on which GPU to use for scientific computation in realtime environment

I am in the early stages of putting together computing requirements for a new system we are building. I am trying to put together the requirements on the GPU(s) as part of the overall computing resources in the new system. The reason I need GPUs is to perform scientific computing (image and signal processing) on large data sets millions of pixels each in a real time environment where high throughput is required. I am at loss as to which card to choose. Here are additional specific questions:

0- Programmer productivity/learning curve. This is really important; if an older model doesn’t allow us to use the latest simplifications/abstractions then it is not a good choice since developer productivity is a lot more expensive.
1- Compute Capability. Do I really need 5? or is 3.5 sufficient?
2- if I went with 5, do I loose the ability to use any GPU libraries? or will they just run slower?
3- I guess Nvidia makes cards that designed FOR GPU computing, like Tesla, and generic ones for Gaming (although they are CUDA-capable). is there a difference between how these two types are connected to the Host CPU? Bandwidth? I know keep the GPU “fed” with data is the biggest difficulty for high throughput applications (that’s what I heard, I guess I’ll find out, but not the hard way I hope).

The problem with just getting the latest and greatest is NOT price, but power consumption which, given other constraints, needs to be kept in check.

Please provide some advice, pointers on how to go about selecting a GPU for my scientific computing , high throughput application.

many thanks,

I can’t answer your questions, but some things come to mind. The most recent CUDA is 6.5. Very likely one of your first constraints is to go to the downloads and check each architecture you are thinking of here:
https://developer.nvidia.com/resources

Second, I’m very much a linux developer and would suggest that if you have a preference and experience with windows or linux you stick to your known programming environment (and if you don’t have a preference I suggest linux…you said cost was not an issue but even so I suggest linux with so much community information available for).

Third, I’ve seen some benchmarks with Jetson and other systems and believe that memory throughput is an issue on GPUs without dedicated GPU memory (the embedded ARMv7 hardware tends to use part of the system memory rather than having dedicated high speed GPU memory the way a desktop graphics card would). You probably need an estimate of how much RAM the CUDA kernels are going to need and pick GPUs with sufficient dedicated RAM. If you don’t know how much memory is needed or whether later expansion is needed, more GPU RAM is better. GPUs with more RAM may in fact be more important than slightly faster GPUs if the situation is right.

FYI, “faster” usually means more power consumption, but sometimes faster is achieved by new die size technology, which in turn generates less heat and consumes less power. You might want to take a casual glance at the nm of die process the chip you are considering for GPU is based on…smaller really is better.

Thank you. Good points.