Poor performance with dual GPU 10x slower?

I’ve got two Geforce 8800GTX cards in a linux box with 2GB memory and two dual-core AMD cpus. Since each card has two 2x2 power connectors, I am using a separate power supply to power one of the cards. Unfortunately I’m seeing very poor performance in multi-gpu applications. In the “multiGPU” example in the SDK, I added a --gpucheck=<1|0> flag, where using 0 will force it to use only one gpu.

Here are the results:

$ ../../bin/linux/release/multiGPU --gpucheck=1

2 GPUs found

Processing time: 472.811005 (ms)

Test PASSED

Press ENTER to exit...

$ ../../bin/linux/release/multiGPU --gpucheck=0

Only one GPU found

Processing time: 40.686001 (ms)

Test PASSED

Press ENTER to exit...

Any idea what’s going on? What should I be seeing from this application? Is it likely to be a power problem?

Please generate and attach an nvidia-bug-report.log.

thanks,
Lonni

I’ve attached the bug report file.
This computer has only a 450W power supply. It is powering one of the 8800GTX cards.
I have a computer next to it with a 1000W power supply which has idle cpus, and I am hijacking its power connectors to power the second 8800GTX card.

I’m afraid that I don’t see any files attached here.

Use nvidia-xconfg to see what speed your PCI-express bus is running at.

When I use one card, I get a single 16x bus, wen I use 2 cards I get one 8x bus and one 4x bus. :(

Maybe you have the same problem?

You could also modify the bandwidth example and see the differences in I/O.
On some motherboards, even if both slots are rated x16, you can still see a difference in I/O between them (due to chipset implementation details)

Seems the attachment didn’t work.
In any case, it seems it was a power problem. I installed a larger power supply in the machine and the performance was as expected!

Now the only problem is that the power connectors on these boards are placed in such a way that I can’t close the lid on my server chassis. Are there any G80 boards that don’t have the connectors on the top??

maybe the card needs a common ground, so it is not a good idea to use two different power supplys?

We have the same problem. I have two 8800GTX cards and I attached two 600W power supplies to each of them. However when i tried to run the “multiGpu” example i get 350ms with 2-GPU and 25ms with 1-GPU. It seems it was not a power problem in my situation.

Could someone give me a suggestion please? Thanks…

I am also experiencing performance result of around 418 msec for the multiGPU example – on a duo-core, dual socket, with 2 8800’s and a 750W power supply.

I’m a bit confused…What is the verdict? Is this h/w or s/w?

Thanks.

I may not be the best person to provide any answer to this issue, but here are some observations…

I don’t think there is anything wrong with the hardware or power supply–seems like it may have been result of the new asynchronous capability of the driver version 1.0. When the “MultiGPU” program runs through the s_gpuCount==1 branch, “gpuThread” function is called and all the calls to the device are done asynchronously, meaning control is returned fairly quickly, resulting in the 40ms mark. The programming document suggests using a function called “cudaThreadSynchronize” to force the program to wait for the device to complete all operations before returning control. I inserted this line of code before stopping the timer (I think this is the way to use it) and ran the program forcing use of only one GPU. The result is about 590ms. That is to say, the actual time required to perform all the tasks on a single GPU is 590ms, not the 40ms as reported.

Hope this helps.

I’m using two Quadro FX 5600s on nForce Pro 3600/3050 chipset with two dual-core Opterons.