I’ve set up the NVIDIA SDK on a RHEL 5.1 box, I’ve got the appropriate display driver installed, (NVIDIA-Linux-x86-169.12-pkg1.run), and the example directory makes successfully. The machine is set to run at initlevel 3 (at boot).
The machine is used by a few different users, and access to the graphics card seems erratic. For example, I sometimes get the error:
[nmoore@buff release]$ ./MonteCarlo
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
There is no device supporting CUDA.
This error sometimes goes away when I restart the machine, but there doesn’t seem to be any regularity to this behavior. Are there CUDA system/admin tools that I should learn about?
I’ve been wondering if there would be implicit problems with booting into initlevel 3. I actully have two graphics cards installed on the system (the other one an integrated motherboard card), but configuring them both to work simultaneously (one for CUDA computation, the second for rendering an x11 window) seemd like more of a challenge than I wanted to tackle.
Ok, so I read the two readme’s and found something new. One of them suggests the following init script:
o In order to run CUDA applications, the CUDA module must be
loaded and the entries in /dev created. This may be achieved
by initializing X Windows, or by creating a script to load the
kernel module and create the entries.
An example script (to be run at boot time):
#!/bin/bash
modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
The /dev/nvidia* entries get created either when starting X, or they have to be created manually (using the script you found, or via some other mechanism of your choosing). If you’re not booting into runlevel 5, then X is not going to start, and you’ll need to ensure that the /dev/nvidia* entries are created via an initscript, or from /etc/rc.local. You definitely do not and cannot do this via /etc/bashrc, as non-root users cannot create these entries.
I’m interested in this as well. If you’ve got multiple Tesla GPUs in a single machine and users don’t explicitly select a device with cudaSetDevice, will it default to time sharing device 0? Is there a way from system land to guarantee exclusive access to a GPU, or will I need to do some custom stuff on my end to let the users know which GPU they should use to get “dedicated” access.
I don’t remember when it was added (CUDA 2.3?), but I did catch the new compute-mode-rules option that was added and have it set to compute-exclusive mode on all of our GPUs at boot time.