Single dual-GPU card vs. 2x single GPU cards

Hello Colleagues,

As a newcomer to CUDA Development, I need your advice on upgrading a system.

In our application, we have two fast sCMOS cameras. After some independent computation (demosaicing, spatial filtering and more), the images from the two cameras are overlaid using OpenGL. All computations are in single precision. The whole thing is supposed to run in real time at high frame rate, which is why we’re upgrading our system.

Clearly, since the computations on the images from the two cameras are independent (but slightly different), it makes sense to do them in parallel on two GPUs. Since we don’t quite have the budget for the Tesla cards, we’re considering buying two GTX 780 Ti cards or a single GTX Titan Z (dual-GPU). Since the Titan Z is the more expensive option, I would like to hear your opinions on any advantages it may confer versus the two-card solution in our application (other than using less power). One important factor is clearly the speed of peer-to-peer copy on the Titan Z, but I couldn’t find the relevant information.

As a side question, why is the Tesla K10 listed as “server only”?

Thank you in advance!

There are a few pros and cons to consider

  1. For the Titan Z, the cards I believe have slightly higher performance, but are much more expensive. I believe they are intended to also be a lot more reliable (NVIDIA accumulate all the better functioning processors from a batch and put them in their higher end cards.) whereas 780Tis are intended for less critical applications first and foremost.

The peer to peer is a significant point.
The Titan Z has a 16 lane pcie switch on board between the two cards, so it’s effectively losslessly adding on an extra PCIE slot to your motherboard. If your motherboard has 2 spare double wide pcie slots with dedicated 16 lanes each and theyre peer enabled, then the 780 is probably the winner. If, on the other hand theyre not peer enabled, or one is 4 or 8 lanes, then I would go with the Titan Z, but this depends on what the application is, particularly how much inter gpu communication is required.

Regarding the “server only” question:
Nearly all (with the exception of the K20 I think) K cards are PASSIVELY heated. This is VERY important to note - desktops simply can’t cool the K cards quickly enough, they overheat and damage (And I’m not talking in months - minutes, hours is the timeframe). Another important difference (which I’ve personally struggled with recently on consumer cards), is that for K cards the power sockets are on the back end of the card, whereas consumer cards have the power socket on the side, which can really clutter the space on a motherboard.

Hi sBc-Random,

Thanks for answering.

Just to make sure I understand right – are you saying that the communication between the two GPUs on the Titan Z goes over its internal PCIe switch, and that this is slower than going through the motherboard? There is no kind of faster direct communication?

BTW, we have the ASUS P9X79E WS motherboard (http://www.asus.com/Motherboards/P9X79E_WS)

The motherboard review on AnandTech (http://www.anandtech.com/show/7613/asus-p9x79e-ws-review) provides a diagram of the PCIe lane layout if that helps.

No - it will be exactly the same speed as a top of the line motherboard connection. There’s no sort of proprietary connection with greatly improved transfer speed, but it will certainly be as good as you can get between two gpus.

That motherboard would do nicely for the 780s as they would run let’s say 95% the speed between the gpus for much much cheaper, I would say go with the 780s. Mind you if the bios supports it you could POTENTIALLY fit in 4 titan Zs which is 8 gpus, as opposed to 4 780s. Really depends how many cards you need (I have 16 :P)

Hi sBC-Random,

Much appreciated. We only need 2 cards for the two types of images, and maybe a third one just for display.

Just to confirm – copying from one GPU to the other is staged through the CPU Memory in either cases, right?

No - that’s the point of peer to peer

Enabling Peer to peer copies means that instead of GPU0>CPU>GPU1 it becomes GPU0>GPU1. Roughly 50% faster in my experience. But can vary…

That’s 1500W just for the GPUs. You’ll probably need a 2-2.5 kW PSU. The most powerful retail PSU is 1.6kW, and combining them is hard and/or dangerous, I think.

I’d like to know more about your setup!

By the way, this person https://devtalk.nvidia.com/default/topic/649542/cuda-setup-and-installation/18-gpus-in-a-single-rig-and-it-works/ reports having 18 GPUs, although it looks like the PCIe bandwidth is pretty bad.