Hi!
I would like to know what is the proper way to exclusively “lock” onto a GPU device such that no other process can access the GPU.
We have set the Compute Mode of the GPU device to EXCLUSIVE_PROCESS.
We repeatedly have problems with this in our cluster of linux machines with process from different users interfering with each other with undesirable side effects liked blocked GPUs etc.
Now, starting a program, we initialize a lot of stuff on the GPU (using up most of its memory) and need to make sure the GPU is not touched by another process as long as the current process does not actually release it. We also wait a specified time for the GPU to become available (timeout).
I am using cudaDeviceReset() to release the device. However I am missing a corresponding function to actually “grab” a device, except the more general cudaSetDevice() which (my understaning) is for switching between different GPUs (something I also do).
Or is it such that any call to cudaSetDevice should implicitely “grab” the device which is not released until cudaDeviceReset is called or the process terminates ?
If so, which other functions do this ?
regards Rolf