How cn a process get exclusive GPU access (EXCLUSIVE_PROCESS)


I would like to know what is the proper way to exclusively “lock” onto a GPU device such that no other process can access the GPU.
We have set the Compute Mode of the GPU device to EXCLUSIVE_PROCESS.
We repeatedly have problems with this in our cluster of linux machines with process from different users interfering with each other with undesirable side effects liked blocked GPUs etc.

Now, starting a program, we initialize a lot of stuff on the GPU (using up most of its memory) and need to make sure the GPU is not touched by another process as long as the current process does not actually release it. We also wait a specified time for the GPU to become available (timeout).

I am using cudaDeviceReset() to release the device. However I am missing a corresponding function to actually “grab” a device, except the more general cudaSetDevice() which (my understaning) is for switching between different GPUs (something I also do).

Or is it such that any call to cudaSetDevice should implicitely “grab” the device which is not released until cudaDeviceReset is called or the process terminates ?
If so, which other functions do this ?

regards Rolf

initialization in the CUDA runtime API is lazy, so basically any API call that requires access to the device will grab the lock. if you want things to be more explicit, cudaFree(0) will grab the device with no side effects (except maybe errors).