Hi Users, hi Developers,
i’m having trouble on a CentOS machine with multiple users and multiple CUDA devices (in my case 4 C2050).
What i want to do is to dynamically assign CUDA devices to users by changing file permissions of the /dev/nvidia[0-3] devices.
The problem comes, as CUDA device specification diverges from device file names.
For instance giving a user /dev/nvidia2 would the user require to submit the job via -device=0 anyways, since nvidias kernel module internals
know that the user owns just one device … -device=2 would fail telling that there is just one CUDA device (from the view point of the user).
This behaviour leads me to race conditions, since when a job stops for a user (and permissions get removed) and independently another one gets
started the device numbers interfere.
So my final question is:
Do i have the possibility to disable this dynamic device addressing completely?
I just want to have static -device=X to /dev/nvidiaX binding.
Any other sugestions are also welcome.
Thanks in advance,