disable cuda scheduler having multiple CUDA capable devices

gabrieltt · February 14, 2011, 1:05pm

Hi Users, hi Developers,

i’m having trouble on a CentOS machine with multiple users and multiple CUDA devices (in my case 4 C2050).

What i want to do is to dynamically assign CUDA devices to users by changing file permissions of the /dev/nvidia[0-3] devices.
The problem comes, as CUDA device specification diverges from device file names.

For instance giving a user /dev/nvidia2 would the user require to submit the job via -device=0 anyways, since nvidias kernel module internals
know that the user owns just one device … -device=2 would fail telling that there is just one CUDA device (from the view point of the user).

This behaviour leads me to race conditions, since when a job stops for a user (and permissions get removed) and independently another one gets
started the device numbers interfere.

So my final question is:

Do i have the possibility to disable this dynamic device addressing completely?
I just want to have static -device=X to /dev/nvidiaX binding.

Any other sugestions are also welcome.

Thanks in advance,
Gabriel

seibert · February 14, 2011, 4:14pm

You might want to take a look at cudawrapper from NCSA. It is an LD_PRELOAD library that can dynamically assign GPUs to jobs in a batch queue system such that every user only needs to request device 0. The SourceForge page is tremendously confusing with all the clutter, but here’s a link directly to the admin README:

tmurray · February 14, 2011, 8:48pm

you have to generate a mapping yourself with nvidia-smi and the CUDA API. from that point, you can reorder devices manually using the CUDA_VISIBLE_DEVICES environment variable

gabrieltt · February 15, 2011, 6:32pm

Thank you very very much.

That was almost exactly the solution i was looking for.

So far it works perfectly for me,

Gabriel

Topic		Replies	Views
2 CUDA devices - multiple user setup CUDA Programming and Performance	5	11082	August 15, 2008
Sharing GPUs on multi-gpu, multiuser systems When cudaSetDevice() goes bad. CUDA Programming and Performance	5	4581	March 17, 2009
How to query device #s of available GPU devices? CUDA Programming and Performance	14	24823	May 5, 2009
cuda API request: cudaSetDeviceLeastUsed() CUDA Programming and Performance	3	2735	April 3, 2008
Disable one of the two GPUs CUDA Programming and Performance	3	7895	October 7, 2008
Nvidia device permissions for multiple users CUDA Programming and Performance	0	2586	May 3, 2010
administering CUDA device on multiuser machine CUDA Programming and Performance	6	6596	November 30, 2009
Shared System for CUDA Prevent multiple users from executing CUDA Programming and Performance	6	6883	April 7, 2008
How do you handle device login? CUDA Programming and Performance	2	5933	March 23, 2011
Change id assigned to gpu's CUDA Programming and Performance	2	4551	July 6, 2011

disable cuda scheduler having multiple CUDA capable devices

Related topics