Problem in "multiGPU" application of CUDA SDK 1.0 Poor performance with dual GPU


I have two 8800GTX cards on a Windows machine (Intel Q6600 CPU, 2x1GB DDR2 RAM, Asus Striker Extreme NForce680i SLI motherboard).

I have speed problem in “multiGPU” example of CUDA SDK. When i run in 2-GPU mode the processing time is 10x slower than the 1-GPU mode:

2 GPUs found

Processing time: 472.811005 (ms)

Only one GPU found

Processing time: 40.686001 (ms)

I attached two 600W psu to each of the cards. I think there is no power problem.

Has anyoen ever tried to run this application in 2-GPU mode? What would I see after the execution of this “multiGPU” application. Any help/suggestion will be useful.

Best Regards,

Ertugrul Dogan


I sent you a separate note on this, I suspect the problem is in your code based on what I saw in the snippet you sent. More testing is needed…


I tried to run the original Nvidia sample code “multiGPU” from SDK 1.0.

There is no modifications on the code. I think there is a problem about my hardware/software configuration. But I can not figure it out. Any help will be appreciated…

Best Regards,

Ertugrul Dogan

This seems to have been discussed before. It seems that 8800 cards require a common ground, so you need to put them all on the same power supply.

Look Here:

I checked the power supplies and saw that their ground is same. Are you sure that two 8800 cards need to be supplied with a unique psu?

Best Regards,

Ertugrul Dogan

What motherboard and chipset are you using? Are all of the GeForce cards installed in slots that run at full x16 bandwidth?

I am using Asus Striker Extreme with chipset “Nvidia nForce 680i SLI”. As it says on the user guide, this motherboard supports SLI technology at full x16, x16 speed.

This motherboard has also one PCI Express x16 slot at x8 speed and one PCI Express x1 slot. I have installed 8800 GTX cards to the blue slots which has x16 speed (referring to the guide).

I’m seeing the same problem on a Dell XPS H2C (680i SLI chipset) with dual 8800GTX cards. The multiGPU example takes 10x longer with one GPU, although the monte-carlo example does run about twice as fast with dual GPUs (why the difference?). When I run a larger multi-GPU application, it seems that each thread spends about half its time idling, even though the problem is definitely compute-bound (very low IO).

Have you discovered the cause of/solution to this problem?

I have XPS with 680i mobo and TESLA D870 (8x PCIx) and the same problem as above.