Using GTX 590 cards for CUDA SLI cards under CUDA?

cudesnick · March 25, 2011, 3:41am

nVidia has recently introduced a dual 512-core board GeForce GTX 590, see e.g.

As far as I know, the two GPUs of this board are connected with SLI. From what I remember reading, one cannot fully utilize multiple boards connected with SLI for running CUDA programs: only one of them would be exposed to a CUDA application.

Hence my questions:

Does anyone have experience running CUDA programs on a Fermi-based dual-GPU board, such as GTX 590?
Can one use the two GPUs of that board independently from one another, as if he had two slightly under-clocked GTX 580 boards instead?
If GTX 590 can be fully utilized under CUDA, does the application code need to be programmed in any special way to make use of the both GPUs of GTX 590?
Can anyone offer any kind of comparative benchmarks?

Thanks!

hamster143 · March 25, 2011, 4:00am

Yes, you can utilize multiple boards through CUDA at the same time. The biggest shortcoming of a multi-GPU setup is that GPUs have independent memory spaces, and copying stuff from one to the other has to go through the PCI Express bus (=> very slow, ~5 GB/s, compared to 100+ GB/s bandwidth between the GPU and its own video memory).

Which brings me to a question I have myself. Is 590 really a SLI card internally? One would think that, having both GPUs on the same board, NVIDIA folks would’ve been able to come up with a way to share all 3 GB of memory between them without involving PCI express.

cudesnick · March 25, 2011, 4:35am

Are you sure that you can fully utilize multiple SLI-connected boards through CUDA at the same time?

hamster143 · March 25, 2011, 6:07am

Hmm. No, I’m not sure.

tmurray · March 25, 2011, 6:59am

SLI has been working since 2.2 or 2.3 or something like that

SPWorley · March 25, 2011, 7:00am

SLI has nothing to do with CUDA. CUDA apps can use multiple GPUs of any type, SLI or not, dual or not. Using multiple GPUs is not transparent (it’s not like you see a single “giant” GPU) but it’s certainly common. (The machine I’m typing on has a GTX295 and 2 GTX480s, all 3 cards (4 GPUs) are running an app right now. With another CUDA-enabled display GPU, too!)

cudesnick · March 25, 2011, 7:54am

Thanks! I missed that.

Let me rephrase it: the fact that some GPUs are connected with SLI has no effect on a CUDA application that runs on these GPUs. Is this accurate?

seibert · March 25, 2011, 1:35pm

Correct. More important for CUDA is that these two GPUs are directly connected via a NF200 PCI-Express switch on the card. Once the second release candidate of CUDA 4.0 comes out, the two GPUs should be able to directly copy memory between each other without loading the external PCI-Express bus. This could have useful application to large multi-GPU algorithms that needs to sync data between devices periodically.

Jimmy_Pettersson · March 25, 2011, 2:37pm

That’s pretty exciting, what do you think that bandwidth could be ?

seibert · March 25, 2011, 4:40pm

Well, it is just PCI-Express 2.0, so whatever you get now doing host-to-device with pinned memory is probably a good guide. (So probably around 6.5 GB/sec.) The big win here is that in a system with 2 or more GTX 590s, you ought to be able to sustain quite a bit more device-to-device traffic than with single GPU devices, at least when going between the right pairs of cards.

Oxydius · March 25, 2011, 5:39pm

GPUDirect 2.0 Peer-to-Peer is only supported on Tesla and Quadro “pro” hardware. Unfortunately, that $700 card won’t support it so memory copies will still use system RAM. External Image

tmurray · March 25, 2011, 6:05pm

true as of RC1, not true as of RC2

Oxydius · March 25, 2011, 9:09pm

That is great news Tim, thanks for that!

cudesnick · March 26, 2011, 1:11am

As it often happens on this forum, I got more useful information than expected.

Maybe someone can post some stats on the GTX 490 PCI-E data throughput? Specifically, it would be interesting to know, how long it takes to transfer, say two different 100M data sets from the host to each of a GTX 590 GPUs and back, compared to sending the same two datasets to/from two separate GTX 580s plugged into the same motherboard?

Thanks!

Jimmy_Pettersson · March 26, 2011, 9:39am

Maybe this is off topic but it seems it also ships with new laser beam fetures: Geforce GTX 590 burns @ SweClockers.com - YouTube

Edit: note that it’s very overcclocked

hocheung20 · March 26, 2011, 6:44pm

According to the various tech reviews, the GTX590 has the NF200 chip which is basically a PCI-E 2.0 x16 switch. Device → Device transfers should be just as fast as through a motherboard, but Host → Device throughput will he halved if you are trying to load on both devices simultaneously.

donny25 · March 28, 2011, 2:44am

Let me get this straight, are you SURE that if say I use Adobe Premiere, that it will truly utilize both GPU’s. I’ve been hesitant to get the 590 due to having heard that with a Dual GPU only a single GPU would be utilized. Meaning I’d be better off just getting a 580.

Also, is there any word if other 590’s will come out, possibly a 595 with 6 + 2 VRM’s like on the 580’s?

Nvidia dropped the ball with the 4+1 VRM’s unfortunately, otherwise it would’ve spanked the 6990, as it would’ve had more overclocking room. :(

Some 590’s dying is being blown way out of proportion especially by the AMD fanboys.

seibert · March 28, 2011, 12:37pm

I have no idea what Adobe Premiere will do. CUDA applications will see a GTX 590 as two distinct CUDA devices, and it is up to the developer to decide what to do with that. In very old CUDA releases, enabling SLI would hide one of the GPUs, which was annoying for people doing graphics and GPU computing. The conclusions in this thread relate to development of CUDA applications, not the behavior of any particular existing CUDA application.

donny25 · March 30, 2011, 4:25am

Thanks, I’ll do more research then.

MacFan · March 31, 2011, 12:00pm

I wonder if tmurray can be absolutely clear on the Peer to Peer aspects. I am at London GPU meeting and one of the Nvidia speakers thought the new RC2 Peer to Peer functionality might relate to allowing two different consumer cards to talk Peer to Peer in the new model, but that the new Peer to Peer model might not work between the two GPUs embedded on the single 590 card. Can you clarify?

Topic		Replies	Views
The fastest platform of GPU computing CUDA Programming and Performance	38	40369	January 19, 2010
CUDA GPU? CUDA Programming and Performance	11	18225	August 11, 2010
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	2893	April 11, 2023
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	1190	October 9, 2024
CUDA and openCL support for multiple GPU/PCI devices? CUDA Programming and Performance	7	5392	April 11, 2009
CUDA with SLI CUDA Programming and Performance	11	4551	October 13, 2010
GTX 590 CUDA power tests CUDA Programming and Performance	40	10168	January 29, 2012
four 9800GX2 cards: will it work? CUDA Programming and Performance	33	23374	May 28, 2008
Using more than 1 CUDA card at a time. Physics simulations flat out flying on GPU CUDA Programming and Performance	12	12575	March 12, 2010
New 285 and 295 cards CUDA Programming and Performance	52	47859	June 3, 2009

Using GTX 590 cards for CUDA SLI cards under CUDA?

Related topics