multi-GPU question.

I have a couple of questions regarding multi-GPU with cuda. I read some of the previous threads but I didnt get a precise answer for my questions.

  1. How do I find out if SLI mode is turned on or off via software? (I mean without actually seeing the hardware) The standard deviceQuery program returns 2 devices installed. Does it mean the cards are already in SLI mode?

  2. I have parallelized my application for multiple GPUs using multiple host threads. But I am not sure if I the application is indeed running parallely across the GPU’s. From different problem size splits across GPU’s I do see that both GPU’s are indeed being used, but I don’t think they are used in parallel. Please note that my host is a dual core processor and my code while calling the host threads is like :

     threadID[0] = cutStartThread((CUT_THREADROUTINE)solverThread, (void *)(plan));
     threadID[1] = cutStartThread((CUT_THREADROUTINE)solverThread, (void *)(plan + 1));
     cutWaitForThreads(threadID, 2);

This means that the parent thread is on a busy-wait. (because of the cutWaitForThreads) Does this mean since I have a dual-core processor the parent thread blocks one of the cores for this busy-wait? So do I actually need a quad-core processor for host to actually parallelize the application across multiple-GPUs?

Any help will be greatly appreciated. Thanks a lot!

First, I guess you could just call one of the solverThread routines directly instead of doing cutStartThread, but this leads to:

Secondly, I think cutStartThread is a really bad idea since it is not documented and almost no-one seems to know what it does or even what exactly it is supposed to do (though I guess the source is around somewhere).

I’d recommend doing it “manually” with e.g. pthreads, that may be more work, but it gives you loads of different ways to wait (certainly not busy-waiting), to pass information to the other threads (so you do not have to start

a new thread each time but can reuse them) etc.

SLI mode is all about hardware fusing. It will show 2 devices as one. And, I dont think CUDA will work with SLI mode.

I dont think the CPU threads Busy wait . They just sleep on a condition… Dats all. and, modern OSes are capable of multi-tasking. One CPU can run 1000s of threads without any problem! So, dont worry about quad-core n all…

Thanks for your responses.

I have a feeling that the GPU’s are not executing the kernel in parallel. Is there a way to sanity check this to see if both the devices can execute their kernels parallelly? Should I change anything in my compile options? Please note that the devices are not the same and they are:

Device 0: “Quadro FX 570”
Device 1: “Tesla C1060”