Three Ultra cards + P6N Diamond performance issue


I have the new P6N Diamond mainboard from MSI, nVidia nForce 680i SLI chipset.
I have installed 3 cards. If I do independent parallel computations (no communication over CPU) with only two cards (every combination is fine), I get on every card 50 work units per second done which is fine. If I do computations on three cards I get in sum only 105 work units done! the performance of two cards drops down to say 25 and 30 work units. The third card gives full perforamance . This behavior is not static meaning that the perormance drop down is ‘switching’ from card to card. If I kill the third computation on either card, everything is fine again (50+50). Does anybody have this mainboard with three cards ? Does anybody have installed three cards on a board and has experiencded similar things ?

Thank you very much!

I took a look at your board on newegg and noticed this:

Expansion Slots

PCI Express x16 4 (The 4 PCI Express interface will operate at either x8+x8+x16+x8 or x16+x16+x8 mode)

Which slots do you have the affected cards in? Is their a BIOS options to modify the PCI-E lane option? Could be that it defaulted to the first option presented and your first two cards are only running in x8 mode. Whether that will have a noticeable impact in CUDA probably depends greatly on your program, but the 8800GTX and Ultra are practically the only cards that show a noticeable performance change going from x8 to x16 PCI-E bandwidth in 3d games. I wouldn’t be completely shocked to see a potential performance drop in some CUDA apps if this is in fact the case. :)

That could be, I have to contact nVIDIA+MSI … But I do only computations on the graphics card and I have only some ascii text printed out, so almost no data transfer Over pci-E. Nevertheless there could be an automatic lowering of the computing performance when going down to X8 bandwidth ?

Try using a quad-core CPU if you don’t have it already.

Thats the solution! It is important to mention that for all developers that at least one core per GPU is needed to run effieciently!


I have a general question. In order to use 2 or more cards for computing, do I have

to use a motherboard with SLI support? Also, do I have to disable SLI mode when

using multiple cards as it says in 3.4 of the Programming Guide?

Thank you.

CUDA doesn’t use SLI. Thus, you don’t need an SLI capable motherboard to use CUDA.

Can someone please throw some more light on whether this is true? How does having one core per GPU help?

Because CPU threads busy wait in a spin loop (with a thread yield in it) when synchronizing GPU and CPU. If you have one core busy waiting on 2 GPU’s, there will be significant delays introduced.