administering CUDA device on multiuser machine

nmoorewsu · May 26, 2008, 1:54pm

I’ve set up the NVIDIA SDK on a RHEL 5.1 box, I’ve got the appropriate display driver installed, (NVIDIA-Linux-x86-169.12-pkg1.run), and the example directory makes successfully. The machine is set to run at initlevel 3 (at boot).

The machine is used by a few different users, and access to the graphics card seems erratic. For example, I sometimes get the error:

[nmoore@buff release]$ pwd
/home/nmoore/NVIDIA_CUDA_SDK/bin/linux/release

[nmoore@buff release]$ ./MonteCarlo
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
There is no device supporting CUDA.

This error sometimes goes away when I restart the machine, but there doesn’t seem to be any regularity to this behavior. Are there CUDA system/admin tools that I should learn about?

I’ve been wondering if there would be implicit problems with booting into initlevel 3. I actully have two graphics cards installed on the system (the other one an integrated motherboard card), but configuring them both to work simultaneously (one for CUDA computation, the second for rendering an x11 window) seemd like more of a challenge than I wanted to tackle.

Any comments or pointers would be appreciated!

nmoorewsu · May 26, 2008, 2:20pm

Ok, so I read the two readme’s and found something new. One of them suggests the following init script:

o In order to run CUDA applications, the CUDA module must be 

  loaded and the entries in /dev created.  This may be achieved 

  by initializing X Windows, or by creating a script to load the 

  kernel module and create the entries.

 An example script (to be run at boot time):

 #!/bin/bash

 modprobe nvidia

 if [ "$?" -eq 0 ]; then

 # Count the number of NVIDIA controllers found.

  N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`

  NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

 N=`expr $N3D + $NVGA - 1`

  for i in `seq 0 $N`; do

  mknod -m 666 /dev/nvidia$i c 195 $i;

  done

 mknod -m 666 /dev/nvidiactl c 195 255

 else

  exit 1

  fi

Should this be in /etc/bashrc?

I’m still confused about the proper runlevel.

netllama · May 26, 2008, 6:39pm

The /dev/nvidia* entries get created either when starting X, or they have to be created manually (using the script you found, or via some other mechanism of your choosing). If you’re not booting into runlevel 5, then X is not going to start, and you’ll need to ensure that the /dev/nvidia* entries are created via an initscript, or from /etc/rc.local. You definitely do not and cannot do this via /etc/bashrc, as non-root users cannot create these entries.

nmoorewsu · May 27, 2008, 3:01pm

Fantastic!

It works now, thanks very much.

Do I have to do anything special on a multi-user machine to have the card de-allocated and re-allocated by subsequent users?

derekgottlieb · January 6, 2009, 3:54pm

I’m interested in this as well. If you’ve got multiple Tesla GPUs in a single machine and users don’t explicitly select a device with cudaSetDevice, will it default to time sharing device 0? Is there a way from system land to guarantee exclusive access to a GPU, or will I need to do some custom stuff on my end to let the users know which GPU they should use to get “dedicated” access.

tbradley · November 26, 2009, 12:07pm

This was ages ago, but just in case you’re still listening you should check out “nvidia-smi --help” from the command line.

derekgottlieb · November 30, 2009, 2:08pm

I don’t remember when it was added (CUDA 2.3?), but I did catch the new compute-mode-rules option that was added and have it set to compute-exclusive mode on all of our GPUs at boot time.

Topic		Replies	Views
2 CUDA devices - multiple user setup CUDA Programming and Performance	5	11049	August 15, 2008
intializing the nvidia device to run cuda without X CUDA Programming and Performance	1	1965	February 3, 2010
Device visibility over SSH Unable to execute CUDA apps over SSH connection CUDA Programming and Performance	7	10701	February 16, 2010
CUDA Works as root but not as user on OpenSUSE 10.3 CUDA Programming and Performance	5	7915	November 6, 2008
/dev/nvidiactl not found CUDA Programming and Performance	10	35315	July 7, 2008
unable to fing cuda device as user CUDA Programming and Performance	3	2193	August 5, 2010
No CUDAdevice is available, help CUDA Programming and Performance	12	13326	January 7, 2010
RHEL startup script for CUDA CUDA Programming and Performance	0	8868	November 28, 2007
Getting CUDA to run with ATI primary graphics CUDA Programming and Performance	10	8251	December 13, 2007
Nvidia device permissions for multiple users CUDA Programming and Performance	0	2569	May 3, 2010

administering CUDA device on multiuser machine

Related topics