Nvidia 9800GX2 Two cuda processors in one slot?

Hello,
I might come up against some sort of NDA asking this question but I will give it a try.

Will the up coming Nvidia 9800GX2 card, which I believe is two 8800GT cards stuck together, work as two separate cuda processing cards each of 512MB of ram and 128 stream processors, if the sli mode is switched off. This would be excellent if it did, I could cram three of these on my motherboard and have 6 cuda processors.

Can anyone provide information on this? It would alter my buying decision as I would hold off for these cards instead of buying a trio of 8800GTS cards.

Come on Nvidia, tell us the good news?

Thanks,
Phil

i think it is VERY unlikely that you will ever get driver support for 3 9800gx2’s on one motherboard. pretty sure i read the max u can have is 2 on one board.

also do the cards need to have the sli connectors for them all to function? if so u need 3 gtx or ultra cards as only those can be used in a set of three in one pc as far as i kno.

i dont kno much about cuda, just saying what i think

The cards will be used purely for cuda and not in sli mode, if this is, as you point out, possible.

I wouldn’t know why it would not be the case. If the 9800GX2 has two GPUs, then CUDA will see it as two GPUs if you disable SLI in the driver. If you put two of these cards in a machine you’ll have 4.

Hi philgarnett,

I am also interested in utilizing multiple GPUs (in fact CUDA devices) in one PC. Could you kindly tell me how much performance gain can be achieved with SLI configuration? Thanks a lot.

svd2cn -

Remember, you don’t use SLI in a multi-GPU CUDA configuration - you need to write your code to detect the installed CUDA-capable devices and spread the problem across them.

The speedup you get is very dependent on the problem you’re trying to solve, the size of the problem, and how you split the problem among the GPUs.

For example, my application (an iterative non-linear optimization) shows about a 1.8x speedup when I go from 2 cards to 4 cards for a certain grid size (240x240x172). But if I run a 120x120x86 grid, I get 7.2 s/iter on 4 cards and 8.7 s/iter on 2 cards. (In this case, the smaller problem is dominated by the time it takes to transfer data to the GPUs. I’ll take this opportunity to prod nVidia to get pinned memory working with multiple GPUs, hint, hint… ;) )

Splitting a problem between several GPUs is much like splitting a problem to run on a distributed memory cluster. I’ve found many of the same rules-of-thumb apply. Ultimately though, you need to carefully analyze and design your application and test different approaches.

Thanks a lot for your quick and detailed response. The performance gain is quite significant (at least in some cases). I think it is worth a try.

P.S. what do you mean by "I’ll take this opportunity to prod nVidia to get pinned memory working with multiple GPUs, hint, hint… "? My understanding is that currently pinned memory doesn’t work with multiple GPUs… do you mean that?

You are right - pinned memory cannot be used by more than one GPU. I was using my performance example to make a case for fixing this. I’d get a nice performance boost in my application if I could use the same pinned memory buffer to DMA data to all four GPUs.