Using GTX 590 cards for CUDA SLI cards under CUDA?

nVidia has recently introduced a dual 512-core board GeForce GTX 590, see e.g.

http://www.geforce.com/#/Hardware/GPUs/geforce-gtx-590/overview

As far as I know, the two GPUs of this board are connected with SLI. From what I remember reading, one cannot fully utilize multiple boards connected with SLI for running CUDA programs: only one of them would be exposed to a CUDA application.

Hence my questions:

  • Does anyone have experience running CUDA programs on a Fermi-based dual-GPU board, such as GTX 590?

  • Can one use the two GPUs of that board independently from one another, as if he had two slightly under-clocked GTX 580 boards instead?

  • If GTX 590 can be fully utilized under CUDA, does the application code need to be programmed in any special way to make use of the both GPUs of GTX 590?

  • Can anyone offer any kind of comparative benchmarks?

Thanks!

Yes, you can utilize multiple boards through CUDA at the same time. The biggest shortcoming of a multi-GPU setup is that GPUs have independent memory spaces, and copying stuff from one to the other has to go through the PCI Express bus (=> very slow, ~5 GB/s, compared to 100+ GB/s bandwidth between the GPU and its own video memory).

Which brings me to a question I have myself. Is 590 really a SLI card internally? One would think that, having both GPUs on the same board, NVIDIA folks would’ve been able to come up with a way to share all 3 GB of memory between them without involving PCI express.

Are you sure that you can fully utilize multiple SLI-connected boards through CUDA at the same time?

Hmm. No, I’m not sure.

SLI has been working since 2.2 or 2.3 or something like that

SLI has nothing to do with CUDA. CUDA apps can use multiple GPUs of any type, SLI or not, dual or not. Using multiple GPUs is not transparent (it’s not like you see a single “giant” GPU) but it’s certainly common. (The machine I’m typing on has a GTX295 and 2 GTX480s, all 3 cards (4 GPUs) are running an app right now. With another CUDA-enabled display GPU, too!)

Thanks! I missed that.

Let me rephrase it: the fact that some GPUs are connected with SLI has no effect on a CUDA application that runs on these GPUs. Is this accurate?

Correct. More important for CUDA is that these two GPUs are directly connected via a NF200 PCI-Express switch on the card. Once the second release candidate of CUDA 4.0 comes out, the two GPUs should be able to directly copy memory between each other without loading the external PCI-Express bus. This could have useful application to large multi-GPU algorithms that needs to sync data between devices periodically.

That’s pretty exciting, what do you think that bandwidth could be ?

Well, it is just PCI-Express 2.0, so whatever you get now doing host-to-device with pinned memory is probably a good guide. (So probably around 6.5 GB/sec.) The big win here is that in a system with 2 or more GTX 590s, you ought to be able to sustain quite a bit more device-to-device traffic than with single GPU devices, at least when going between the right pairs of cards.

GPUDirect 2.0 Peer-to-Peer is only supported on Tesla and Quadro “pro” hardware. Unfortunately, that $700 card won’t support it so memory copies will still use system RAM. :down:

true as of RC1, not true as of RC2

That is great news Tim, thanks for that!

As it often happens on this forum, I got more useful information than expected.

Maybe someone can post some stats on the GTX 490 PCI-E data throughput? Specifically, it would be interesting to know, how long it takes to transfer, say two different 100M data sets from the host to each of a GTX 590 GPUs and back, compared to sending the same two datasets to/from two separate GTX 580s plugged into the same motherboard?

Thanks!

Maybe this is off topic but it seems it also ships with new laser beam fetures: http://www.youtube.com/watch?v=sRo-1VFMcbc

Edit: note that it’s very overcclocked

According to the various tech reviews, the GTX590 has the NF200 chip which is basically a PCI-E 2.0 x16 switch. Device -> Device transfers should be just as fast as through a motherboard, but Host -> Device throughput will he halved if you are trying to load on both devices simultaneously.

Let me get this straight, are you SURE that if say I use Adobe Premiere, that it will truly utilize both GPU’s. I’ve been hesitant to get the 590 due to having heard that with a Dual GPU only a single GPU would be utilized. Meaning I’d be better off just getting a 580.

Also, is there any word if other 590’s will come out, possibly a 595 with 6 + 2 VRM’s like on the 580’s?

Nvidia dropped the ball with the 4+1 VRM’s unfortunately, otherwise it would’ve spanked the 6990, as it would’ve had more overclocking room. :(

Some 590’s dying is being blown way out of proportion especially by the AMD fanboys.

I have no idea what Adobe Premiere will do. CUDA applications will see a GTX 590 as two distinct CUDA devices, and it is up to the developer to decide what to do with that. In very old CUDA releases, enabling SLI would hide one of the GPUs, which was annoying for people doing graphics and GPU computing. The conclusions in this thread relate to development of CUDA applications, not the behavior of any particular existing CUDA application.

Thanks, I’ll do more research then.

I wonder if tmurray can be absolutely clear on the Peer to Peer aspects. I am at London GPU meeting and one of the Nvidia speakers thought the new RC2 Peer to Peer functionality might relate to allowing two different consumer cards to talk Peer to Peer in the new model, but that the new Peer to Peer model might not work between the two GPUs embedded on the single 590 card. Can you clarify?