What is the best NVIDIA GPU card for gpgpu computing?

I have some questions relating to the gpgpu computing by nvidia gpus.

  1. As I know, not only cuda but also opencl is used for gpgpu computing.
    For a specific gpu, performance can be differ according to which language is used?
  2. Among non-cuda enabled gpus, Is there any preferable gpu for gpgpu computing?
  3. There are some cuda enabled gpus from NVIDIA.(Tesla, Quadro, NVS, GeForce, …)
    what kind gpu is the best for HPC-gpgpu and why?
  4. As I know, the higher compute capability is, the higher performance is. it’s right?
    Then, some newly developed gpus has lower compute capability than previous one, but performance is higher. Why?
    Furthermore, other kind gpus besides GeForce have all lower compute capability than 5.0 comparability.
    Then GeForce is the best for HPC?
    I am not dealing with gaming performance, but only gpgpu computing for HPC.
    Thanks in advance.

What kind of HPC computing are you doing? Do you need a lot of double-precision throughput? Do you need high memory bandwidth? Is host/device communication a foreseeable bottleneck? Are you looking for a workstation configuration or are you equipping an entire server cluster? How serious are the consequences of a silently failing computation in your use case?

“Compute capability” is primarily an indicator of feature set. Each compute capability includes the features of the previous (next lower numerically) compute capability and adds new ones. Whether any of these features are crucial to your application(s), you will have to decide.

The performance differential across all GPUs with a particular compute capability is typically high, often on the order of 1 to 10 between the lowest and the highest performing parts. Average performance increases over time, as does compute capability, thus giving a positive correlation. But you can easily find example of a particular GPU of generation N being slower than a particular GPU of generation N-1.

I need high memory bandwidth. double throughput is not a great deal to me.
I am building server cluster.
Let’s compare GeForce GTX 750 and 780 Ti.
Why NVIDIA developed 780 Ti has lower CC 3.5, while 750 has 5.0 CC?

What GPUs are you able to fit into your server enclosures? Typical “pizza box” rack-mounted servers require passively cooled GPUs.

Only NVIDIA can answer your question as to why it develops certain SKUs. If you look at the market for discrete GPUs, you will see that it is segmented into numerous price points, and that both of the major vendors try to target each price point with products that (hopefully) maximizes their sales and profits.

The GTX 780 Ti is an older high-end consumer card based on the Kepler sm_35 architecture, cost around $500. GTX 750 is a newer mid-range consumer card based on the Maxwell sm_50 architecture (i.e. the architecture that follows Kepler), cost around $150. So these two cards do not occupy the same price point.

NVIDIA’s consumer GPUs are currently being transitioned from the older Kepler architecture to the more energy-efficient Maxwell architecture. I would expect this transition to be completed in 2015. Market forces (even beyond the laptop market where batteries provide limited energy storage) would appear to favor energy-efficient solutions, part of the reason being the relatively high cost of electricity in many locales.

780/780Ti was developed prior to 750/750Ti. It also had a higher performance target that the chip used on the 750/750Ti (GM107) could not satisfy. 750/750Ti is a lower end product, but newer. At the time 750/750Ti was developed, GM107 was available. At the time 780/780Ti was developed, no member of the GMxxx GPU family was available. Therefore 780/780Ti used GK110.

You may be making some assumptions or have some expectations about marketing brand names and underlying technology which are simply not true. In general, higher numbers within the same marketing brand family should have higher performance (using a loose definition of “performance”). Other expectations you may have may be unfounded.

This observation about mixed GPU architectures within a marketing brand family is not unique to the GeForce 7xx series of products.
You can find members of the GeForce 6xx series of products that belong to cc3.x and other members that belong to cc2.x

For a quick specification comparison between GTX 750 and GTX 780 Ti, the following may be helpful:

[url]http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-780-ti/specifications[/url]
[url]http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-750/specifications[/url]

Thanks all comments.
The members of GeForce 6xx family must have low performance than 7xx family.
And also, cc5.x must be superior than cc2.x and 3.x.
I have to find the best performance GPU.
NVIDIA says that Tesla is for HPC and GeForce is for entertainment.
what about comparison between Tesla and GeForce?

NVIDIA’s take on Tesla: [url]http://www.nvidia.com/object/why-choose-tesla.html[/url]

I think you would want to puzzle out what kind of performance you need, what features are important for your use case, and what price point(s) you have to hit. For example, only professional cards offer ECC. Whether that is important is a function of your application. When an undetected single-bit error could have serious consequences (say, a medical application, or industrial control) your conclusion may well be different than when doing a Monte-Carlo based simulation where any local deviation may be ephemeral.

The best approach is probably to benchmark your actual application(s) on whatever platform(s) you are considering.

The Tesla line also has GPU direct, which can be really useful for some projects:

In general the high-end GTX gaming GPUs are a bit faster than the Tesla GPUs(due to higher clock speeds), but that is deliberate in order to improve reliability. I have worked with the Kepler Tesla GPUs, and they are very reliable and never had any problems.

Also the Tesla GPUs have a better TCC driver for Windows, but this generally only matters when you call many small kernels repeatedly.

Also (someone correct me if I am wrong) Nvidia does not themselves manufacture the GTX line, rather companies such as ASUS or EVGA get a reference design from Nvidia and build their own version. As I understand it the Nvidia has more direct involvement in the manufacture of the Tesla and Quadro products.

The correct answer is what txbob and njuffa said, which the ‘best’ GPU is the one that matches your current application needs to the correct hardware.

Get a GTX 980 or GTX 970 with CUDA Compute Capability 5.2 or wait for GM200 if you need FP64 performance.

Thanks all.
But why Tesla and Quadro are so expensive than Geforce?

Again, because Nvidia has more direct control of the manufacturing process, and offers features such as the TCC driver, GPUdirect,ECC, and other features which needed by those who are using multiple GPUs in a server environment. Nvidia does test the Tesla line to guarantee that they can run at full speed 24 hours a day for years without issues. There are no such test done for the gaming GPUs(at least not over a long period of time, though I think they do test running at full speed over a single 24 hours period)

Also they (I believe) guarantee a 10-year support for the Tesla and Quadro GPUs, so they can be used in medical devices/applications and be supported over that timeline(there are FDA requirements for such things in some circumstances).

I should be noted that most supercomputers do in fact use the Tesla line, because they need the long life support and the reliability etc. Also the Tesla line does make an effort to design their product for such customers, rather than the gaming enthusiast community.

Your answer is very helpful.
I could make sure for your answer surfing the internet.
At first, it seems that Direct communication(RDMA) between GPUs is very effective and reliable in Tesla rather than the others.
So it makes Tesla to be proper for multi-gpu programming.
At second, it’s more robust in terms of usability like you said.
As a result, It’s worth to be expensive.
Thanks CudaaduC.