Nvidia 9800GX2 Two cuda processors in one slot?

philgarnett · February 20, 2008, 9:00pm

Hello,
I might come up against some sort of NDA asking this question but I will give it a try.

Will the up coming Nvidia 9800GX2 card, which I believe is two 8800GT cards stuck together, work as two separate cuda processing cards each of 512MB of ram and 128 stream processors, if the sli mode is switched off. This would be excellent if it did, I could cram three of these on my motherboard and have 6 cuda processors.

Can anyone provide information on this? It would alter my buying decision as I would hold off for these cards instead of buying a trio of 8800GTS cards.

Come on Nvidia, tell us the good news?

Thanks,
Phil

Qazax · February 20, 2008, 9:47pm

i think it is VERY unlikely that you will ever get driver support for 3 9800gx2’s on one motherboard. pretty sure i read the max u can have is 2 on one board.

also do the cards need to have the sli connectors for them all to function? if so u need 3 gtx or ultra cards as only those can be used in a set of three in one pc as far as i kno.

i dont kno much about cuda, just saying what i think

philgarnett · February 20, 2008, 10:06pm

The cards will be used purely for cuda and not in sli mode, if this is, as you point out, possible.

wumpus · February 21, 2008, 9:10am

I wouldn’t know why it would not be the case. If the 9800GX2 has two GPUs, then CUDA will see it as two GPUs if you disable SLI in the driver. If you put two of these cards in a machine you’ll have 4.

svd2cn · March 5, 2008, 1:50am

Hi philgarnett,

I am also interested in utilizing multiple GPUs (in fact CUDA devices) in one PC. Could you kindly tell me how much performance gain can be achieved with SLI configuration? Thanks a lot.

jimh · March 5, 2008, 6:49pm

svd2cn -

Remember, you don’t use SLI in a multi-GPU CUDA configuration - you need to write your code to detect the installed CUDA-capable devices and spread the problem across them.

The speedup you get is very dependent on the problem you’re trying to solve, the size of the problem, and how you split the problem among the GPUs.

For example, my application (an iterative non-linear optimization) shows about a 1.8x speedup when I go from 2 cards to 4 cards for a certain grid size (240x240x172). But if I run a 120x120x86 grid, I get 7.2 s/iter on 4 cards and 8.7 s/iter on 2 cards. (In this case, the smaller problem is dominated by the time it takes to transfer data to the GPUs. I’ll take this opportunity to prod nVidia to get pinned memory working with multiple GPUs, hint, hint… ;) )

Splitting a problem between several GPUs is much like splitting a problem to run on a distributed memory cluster. I’ve found many of the same rules-of-thumb apply. Ultimately though, you need to carefully analyze and design your application and test different approaches.

svd2cn · March 6, 2008, 12:31pm

Thanks a lot for your quick and detailed response. The performance gain is quite significant (at least in some cases). I think it is worth a try.

P.S. what do you mean by "I’ll take this opportunity to prod nVidia to get pinned memory working with multiple GPUs, hint, hint… "? My understanding is that currently pinned memory doesn’t work with multiple GPUs… do you mean that?

svd2cn -

Remember, you don’t use SLI in a multi-GPU CUDA configuration - you need to write your code to detect the installed CUDA-capable devices and spread the problem across them.

The speedup you get is very dependent on the problem you’re trying to solve, the size of the problem, and how you split the problem among the GPUs.

For example, my application (an iterative non-linear optimization) shows about a 1.8x speedup when I go from 2 cards to 4 cards for a certain grid size (240x240x172). But if I run a 120x120x86 grid, I get 7.2 s/iter on 4 cards and 8.7 s/iter on 2 cards. (In this case, the smaller problem is dominated by the time it takes to transfer data to the GPUs. I’ll take this opportunity to prod nVidia to get pinned memory working with multiple GPUs, hint, hint… ;) )

Splitting a problem between several GPUs is much like splitting a problem to run on a distributed memory cluster. I’ve found many of the same rules-of-thumb apply. Ultimately though, you need to carefully analyze and design your application and test different approaches.

[snapback]338171[/snapback]

jimh · March 6, 2008, 8:57pm

You are right - pinned memory cannot be used by more than one GPU. I was using my performance example to make a case for fixing this. I’d get a nice performance boost in my application if I could use the same pinned memory buffer to DMA data to all four GPUs.

Topic		Replies	Views
How will 9800 GX2 appear to CUDA? CUDA Programming and Performance	15	10939	March 19, 2008
9800 GX2 - one device or two? CUDA Programming and Performance	5	2726	August 22, 2008
four 9800GX2 cards: will it work? CUDA Programming and Performance	33	23391	May 28, 2008
Quick 9800 GX2 question regarding SLI memory CUDA Programming and Performance	2	2043	June 9, 2008
How is SLI treated? (specifically, GeForce 295) CUDA Programming and Performance	1	5567	September 4, 2009
Question: CUDA with multiple devices in SLI mode CUDA Programming and Performance	3	5950	March 13, 2009
9800 GX2 and non-SLI motherboard CUDA Programming and Performance	6	5058	March 21, 2008
CUDA on DualGPU Card (Like 9800gx2) CUDA Programming and Performance	1	1340	February 7, 2009
how to merge resources of GTX295 CUDA Programming and Performance	3	2913	July 30, 2009
SLI CUDA Programming and Performance	2	2537	June 19, 2011

Nvidia 9800GX2 Two cuda processors in one slot?

Related topics