Using multi-threaded programs with multiple GPUs in EXCLUSIVE_PROCESS compute mode

I am using a program which queries the number of GPUs on the system and then creates a separate thread using pthreads_create() to run a calculation on each device.

for(i = 0; i < numDevices; ++i) {
    args[i].id = i;
    args[i].gpu = i;
    pthread_create(&threads[i], NULL, threadRoutine, &args[i]); 
}

This program runs as expected when the device is in DEFAULT compute mode. However, I would like to run it in EXCLUSIVE_PROCESS compute mode. In that mode, it produces an error message for the thread that is associated with the GPU that is in use by the main process:
Thread 0 ­ CUDA Error: exclusive­thread device already in use by a different thread (see Programmer’s Guide)

I can run this program properly in EXCLUSIVE_PROCESS mode using the Multiprocess service (MPS) daemon if it is launched as root. However, I cannot run it if the MPS is launched by my own user account.
I can also run this program properly if the function call involving the device that the main program is using is made by the main program itself and not by a separate pthread.
I have two questions about this scenario:

  1. The description of EXCLUSIVE_PROCESS compute mode says: Many threads in one process will be able to use cudaSetDevice() with this device. My expectation is that creating new threads with pthreads that use the same device as the main process should not cause a conflict with the device in this mode. My understanding is that this error message should show up for EXCLUSIVE_THREAD mode, but should not show up for EXCLUSIVE_PROCESS or DEFAULT compute mode. Why am I getting the error message in EXCLUSIVE_PROCESS mode?

  2. The MPS documentation states that it can be run in user mode to manage processes created by that user (See 5.1.2 https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf ). However, I find that the daemon fails to provide access to the spawned threads unless MPS is run as root. What might be causing the MPS daemon to misbehave when running in single user mode?

I am happy to provide more information if requested. I cannot share the code I am using, but I may be able to prepare a minimal example if it is difficult to answer my questions without one.

Thanks.

Dan

Are you certain that the device is in exclusive PROCESS mode and not exclusive THREAD mode? Because the error message you quoted says:

“Thread 0 ­ CUDA Error: exclusive­thread device already in use by a different thread (see Programmer’s Guide)”

That looks to me like the device is in exclusive THREAD mode.

Which CUDA version are you using? Which OS? Did you verify the compute mode using nvidia-smi -a ?

Note that multiple threads in the same process sharing the same GPU should each call cudaSetDevice before manipulating the device. This causes them to share the context that may be already created on that device.

As a quick test, I ran a pthread-ed multithread linux application just now using CUDA 6.5RC on a GPU in Exclusive PROCESS mode with no trouble. When I switched that device to Exlusive-THREAD mode I got the following error(s):

========= Program hit cudaErrorDeviceAlreadyInUse (error 54) due to “exclusive-thread device already in use by a different thread” on CUDA API call to cudaSetDevice.

Here is a full sample session, including sample code:

[url]http://pastebin.com/V4aPd2rR[/url]

Thanks for your help. You are correct, the cards were in EXCLUSIVE_THREAD mode, to my surprise. This explains both of my questions. I apologize for the confusion.

Dan