Error: "Setting the device when a process is active is not allowed."


I’m having trouble using an application which uses CUDA. The application was written using CUDA 2.0.
Now I installed the 2.1 drivers and the 2.1 SDK. What happens now is, that the application shows following message:

“Setting the device when a process is active is not allowed.”

Since I currently only have the executable I cannot change the code. What could cause the problem although it worked before?

I’m using a Windows-XP 64bit system, but it also does not work on a Windows-XP 32bit.


It could be what the error says, you are trying to change the device with cudaSetDevice() while you still have memory allocated on the current GPU.

That’s a wild guess, but if you don’t have the code, at least provide us with a pseudocode as detailed as you can remember.

In 2.0, calling cudaSetDevice after a context was already created returned success even though it did (essentially) nothing. In 2.1, it was changed to explicitly fail. Too many people assumed this did something meaningful and were confused by the 2.0 behavior, so we changed it.

So that means that once a context is created in a thread, it cannot be changed?

not from the runtime API unless you call cudaThreadExit(), destroy the context, and then remake it on a different device.


thanks a lot for the answer. In the meantime I have switched back to CUDA 2.0, since I’m not able to recompile the application and change the code.
In my next version I have full access to the CUDA and it’s possible for me to change the behaviour.

Thank you

Is possible to use CUDA parallel on two GPUs (for different kernels and data sets of course) in one application?

Does it make more than one GPU little useless?

Certainly. You have to use a host threading mechanism of some kind because CUDA only permits one context per host thread, but it can be done without too much trouble.

Or use the driver API, where you can detach contexts and move them to different threads.

I got this working with openmp and cuda 2.1, where I simply store which openmp threads are currently mapped to a GPU.

See code for inspiration:

int large_number = 10000;




bool threadMappedToCuda[NUM_OPENMP_THREADS] = {false};

#pragma omp parallel for

for (i = 0; i < large_number; ++i)


  unsigned threadId = omp_get_thread_num();

  if (threadMappedToCuda[threadId] == false)


	CUDA_SAFE(cudaSetDevice(threadId % NUM_CUDA_DEVICES));

	threadMappedToCuda[threadId] = true;


/* do cuda stuff here */