I have the new P6N Diamond mainboard from MSI, nVidia nForce 680i SLI chipset.
I have installed 3 cards. If I do independent parallel computations (no communication over CPU) with only two cards (every combination is fine), I get on every card 50 work units per second done which is fine. If I do computations on three cards I get in sum only 105 work units done! the performance of two cards drops down to say 25 and 30 work units. The third card gives full perforamance . This behavior is not static meaning that the perormance drop down is ‘switching’ from card to card. If I kill the third computation on either card, everything is fine again (50+50). Does anybody have this mainboard with three cards ? Does anybody have installed three cards on a board and has experiencded similar things ?
I took a look at your board on newegg and noticed this:
Expansion Slots
PCI Express x16 4 (The 4 PCI Express interface will operate at either x8+x8+x16+x8 or x16+x16+x8 mode)
Which slots do you have the affected cards in? Is their a BIOS options to modify the PCI-E lane option? Could be that it defaulted to the first option presented and your first two cards are only running in x8 mode. Whether that will have a noticeable impact in CUDA probably depends greatly on your program, but the 8800GTX and Ultra are practically the only cards that show a noticeable performance change going from x8 to x16 PCI-E bandwidth in 3d games. I wouldn’t be completely shocked to see a potential performance drop in some CUDA apps if this is in fact the case. :)
That could be, I have to contact nVIDIA+MSI … But I do only computations on the graphics card and I have only some ascii text printed out, so almost no data transfer Over pci-E. Nevertheless there could be an automatic lowering of the computing performance when going down to X8 bandwidth ?
Because CPU threads busy wait in a spin loop (with a thread yield in it) when synchronizing GPU and CPU. If you have one core busy waiting on 2 GPU’s, there will be significant delays introduced.