Advice on single vs multi-GPU system

So far a complete novice (successfully ran the cuda samples on a GT 240 but haven’t yet written any code) and am already thinking about a modest cuda development machine.

Can anyone make general statements about multi-GPU vs single GPU?

I assume multi-GPU is pretty much the norm now. I also assume that high compute capability and large memory are critical. From a pure pricing standpoint, it seems to me one is better off getting multiple GPU’s (e.g. 3x GTX 750 Ti with 6GB of total memory, 1940 cuda cores, and 5.0 compute capability for ~$450) vs a single GPU (e.g. GTX 780 with 3GB memory, 2304 cuda cores, and 3.5 compute capability for ~$530. Is this a sound line of thinking?

Not exactly(in the way you imply). What is increasing common would be one GPU dedicated to the video out, and another for computation.

Keep in mind from a programming perspective that is easier to map a larger problem to a high-capacity GPU (for example a Titan,Tesla or GTX 780ti), than to break up the problem amonst cheap GPUs. With multiple cheap GPUs you have to coordinate the memory transfers, correctly break up the problem, assemble the answers, and possibly deal with device-to-device transfers.Also you will need a motherboard which will support that many GPUs, without cutting the x16 lanes to x8 (which means your bus transfer speed is cut in half, and that matters).

For extremely parallel type problems (image processing, brute force) you may save $50-100 dollars by using 3 GTX 750 vs one GTX 780ti, but it will be at best a marginal improvement(if any)

Another issue you are not thinking about is the issue of memory bandwidth. A single GTX 780ti has a upper bound of 336 Gbs, while the GTX 750 has an upper bound of 80 GBs. You would be surprised how may problems end up being memory bound, and believe 80 Gbs is not going to cut it.

This is a complex topic, and you would be far better off learning with a decent GPU. Maybe someone can compare 3 GTX 750s vs a GTX 780ti (or Titan), but my money is on the GTX 780ti for most tasks.

Thank you for your thoughtful comments. I was thinking more about memory size (6GB for the multiple 750’s vs 3GB for the 780ti) than anything else. I was speculating that memory capacity would be the #1 issue, but I guess that’s not the case. Can a general statement be made about how much memory the GPU should have (other than the more the better)?

It depends on what you are going to be doing in CUDA. If there is a single GPU being used for both calcs and by the operating system, that will (I believe) reduce the available memory by some amount(not sure how much, depends on the operating system and what other resources may use the GPU).

Also you can break up your data set and send it to the device in chunks if it really is that large.

The Titan has 6 GB I believe if that is your main concern. If you think about it 3GB of memory is quite a bit for compute. Games may use more than that, but most data set I have dealt with are a fraction of that size.

Let’s say I want to make a 2-GPU setup, a GTX 780 for computing only and a separate card for the display. Would I really need a motherboard capable of 2 x16 or would 2 x8 work? I see very few motherboards that are capable of 2 x16 (x16/NA or x8/x8 seems pretty common).In other words, would a x16 slot operating at x8 be a bottleneck for the 780? How can I tell from the graphics card specs if it will operate ok in a x8 slot?

I went to a computer store the other day and was told that no graphics card on the planet could saturate a x16 slot. Though I didn’t ask, I took this to mean a x16 slot operating at x8 should be fine for any graphics card. Is this true?

It depends on what you will doing with the GPUs. A recent project required that I make multiple large transfers across the bus in real time and in such cases the x16 makes a difference. On the other hand for applications with few data transfers there will be little difference.

Of course a configuration with two pci-e 3.0 slots at x8 will work, but the device-host and host device transfer speeds will be cut in half.

Also there is (in my experience) quite a large difference in CUDA performance between the GTX 780 and the GTX 780ti. I would say that the performance difference (at least in linux) is greater than the price difference. Having said that they both are great cards, assuming you do not need double precision.

Thanks. Wow, this is complicated!

(1) So, you would say a 780 Ti/3GB is superior to a 780/6GB? I know the real answer depends on what will be done. Right now, I’m just getting started and am trying to understand the key tradeoffs.

(2) if I did want double precision, what would be advisable (other than a Tesla k40)?

The Titan has a ‘DP’ mode which can be switched on and off for better double precision speed.

All recent GPUs can handle 64 bit calcs, just the Tesla line is oriented towards that kind of work.

Doing memory operations on 64 bit words will be fine on a GTX 780/780ti, but if you are doing large FFTs or Linear Algebra solvers in 64 bit that is where there will be big difference.

Do not mean to make this seem complicated, much depends on what you plan to do with CUDA.

For work projects I use cuBLAS, cuSPARSE,cuFFT and thrust frequently, and for ‘hobby’ projects I tend to write my own kernels (both 32 and 64 bit).

Do they make a 6GB 780 ? I thought they were all 3GB.

The main things I like about the GTX 780ti are the 336 GBs bandwidth and the fast 32-bit integer operations.

If you asked me which GPU is the best all round value for everything, I would say the Titan. If you asked which is best all-round (no price limit) I would say either the Tesla K40 or the Quadro K6000.

For a beginner any newer GPU will be fine, so just get something decent and figure out what projects you are interested in developing.

And if you also play games, then the GTX 780/780ti are going to be awesome in that area as well.

Any person in a computer store is going to evaluate GPUs based on gaming only. It is true most games will not require the full pci-e 3.0 x16 capability, but if you are writing large image processing applications you can push it farther than a typical game.

Thanks so much for your comments! My main takeaway is to make sure to have a motherboard at least 2x 16 capable.

Other than that it seems the rule of thumb is to get as many cores as you can afford with as much memory as your target projects require.

Re 6GB 780:

I’ve even seen that a 6GB 780 Ti is in the works:

Fyi, for basic learning (please don’t laugh), I’m planning to use a 1GB GT 640 GDDR5 w/ 3.5 compute capability.