PBS Pro (and probably SGE, LSF, Torque and all the other batch systems) knows squat about allocating GPUs, nor does the CUDA runtime seem to allow exclusive ownership of a device.
Here’s an LD_PRELOAD-able shim that overrides cudaSetDevice() and ensures that GPU-requesting programs will get exclusive use of a GPU, or die trying.
It arbitrates the GPU allocation with lockfiles in /var/lock/cuda (or the location of CUDA_LOCKFILE_DIR).
Use it by setting LD_PRELOAD=/path/to/cuPlayNicely.so. It will tell what it’s up to if you set CUDA_LOCKFILE_VERBOSE.
No. If it’s not called explicitly in the user code, device #0 will used. Not obvious way to over-ride that that I can think of.
You could always move the call to the real cudaSetDevice() to the DSO’s constructor, but then any program that preloads the library will acquire a GPU, required or not.
Also, for completeness, you’d probably want to override the driver API’s set device function, in case you should have a user into that sort of self-abuse.
But, There is a little question :
how to automatic call the function “my_fini” to release the device even when program is interrupted ?
Is there a command to call to “unload” library after one kills the program ?