One GPU of four running slowly?

I am experimenting with my first multi GPU code. My problem is relatively simplistic. There are thousands (over 65K for the test case) of computationally intensive, completely independent calculations that must be performed from a relatively small amount of input data (~10 MB in the test case). I get a great speedup on a single GPU - 30x on FX 5600 and 100x on FX 5800.

I am trying to split the computation amongst 4 Quadro FX 5600’s. Since the input data is tiny, I simply load all of the input onto each of the cards and split the output between them. I use win threads to launch four separate process, each of which calls cudaSetDevice with the appropriate thread number. I use cudaGetDevice and cudaGetLastError to verify that everything went according to plan.

cudaSetDevice(nDeviceToUse);

cudaGetDevice(&nActualDevice);

cudaError_t err = cudaGetLastError();

if ( cudaSuccess != err) 

{

	std::cout << "Cuda error: " << cudaGetErrorString(err) << std::endl;

}

std::cout << "Device #" << nDeviceToUse << " requested, Device #" << nActualDevice << " initialized." << std::endl;

Everything prints out as expected and appears to be going well. However, three of the threads complete in approximately 1/4 of the original time and the final thread takes twice as long as the others (it alone runs for half of the original run time)… The answers are OK (at least on a quick visual inspection). I have also verified that the work is being split appropriately.

Is there something I might be doing wrong?

PSU problem? PCIe bandwidth disparity? Thermal downclocking?

I don’t think so… I have multi GPU code from other sources that does not exhibit this problem. I will find another system to test on to be sure, but I’m more inclined to guess that I did something wrong.

On second thought, it looks like you were right. :-)

I tried running the multi GPU code on just two of the four. I then did a second run on the other two. The first run completed in an appropriate time with both GPUs finishing together. One of the GPUs involved in the second run took twice as long.

Thanks!

Hi,

I’ve seen this on windows, on linux I had no such problem. My guess is that the actuall GPU which is connected to your screen is the slow one - its a guess.

You might want to choose another GPU (one of the 4 you have) to be the one responsible for the screen and see if this one is the slow one now… if yes it means

the guess was right :)

Again on linux without X I had no such problem.

eyal