I just started testing on a dual-GPU box (Dell H2C, with dual 8800GTX cards, 680i chipset, running ubuntu 32-bit), and I can’t seem to get both GPUs to work in parallel. I have two threads, each working on a compute-bound task on different device. However, the behavior I get is similar to what I see with a single GPU; each thread is busy about half the time, and appears to be waiting for the GPU for the other half.
I did call cudaSetDevice() first, and I called cudaGetDevice() to confirm that each thread has a different device number. The task involves virtually no I/O off the card, although there are a number of device-to-device copies and many different kernels running in sequence.
Is there any way to determine whether the two threads are really executing on two GPUs, as opposed to sharing one?
Are there any calls I could be making that inadvertently make one thread wait for the other?
If anyone has suggestions, please let me know.