Setting group/mode on /dev/nvidia*

All,

I’m trying to change the group on the /dev/nvidia* files to enforce access control to the GPUs. The GPUs are P100s, and we are running the latest drivers. Unfortunately, I’m having a hard time with this, and none of the tips I’ve found online have worked. I’ve tried using udev rules and modprobe configuration options, but the group is never updated.

Does anyone have any experience getting this working?

Thanks!

This blog may be of interest:

http://www.resultsovercoffee.com/2011/01/cuda-in-runlevel-3.html

Thanks for the article, but it unfortunately doesn’t work for me. I can manually add the device files at startup using mknod, set the correct mode, and set the correct group (verified with ls /dev | grep nvidia). Unfortunately, upon running nvidia-smi, the devices are overwritten, their modes are set back to 0666 (I want them to be 0660), and their groups are reverted also.

Before nvidia-smi (correct):

$ ls -al /dev/ | grep nvidia
crw-rw----  1 root nvidia  195,   0 Apr 19 15:38 nvidia0
crw-rw----  1 root nvidia  195,   1 Apr 19 15:38 nvidia1
crw-rw----  1 root nvidia  195, 255 Apr 19 15:38 nvidiactl

After nvidia-smi:

$ ls -al /dev  | grep nvidia
crw-rw-rw-  1 root root    195,   0 Apr 19 15:38 nvidia0
crw-rw-rw-  1 root root    195,   1 Apr 19 15:38 nvidia1
crw-rw-rw-  1 root root    195, 255 Apr 19 15:38 nvidiactl

Do you have persistence mode set at machine start up? Note that the proper method to set persistence mode has changed since that blog article was written

According to /lib/udev/rules.d/71-nvidia.rules, it looks like Nvidia is running its own persistence mode:

# Tag the device as master-of-seat so that logind is happy
# (see LP: #1365336)
SUBSYSTEM=="pci", ATTRS{vendor}=="0x10de", DRIVERS=="nvidia", TAG+="seat", TAG+="master-of-seat"

# Start and stop nvidia-persistenced on power on and power off
# respectively
ACTION=="add" DEVPATH=="/bus/acpi/drivers/NVIDIA ACPI Video Driver" SUBSYSTEM=="drivers" RUN+="/usr/bin/start-nvidia-persistenced"
ACTION=="remove" DEVPATH=="/bus/acpi/drivers/NVIDIA ACPI Video Driver" SUBSYSTEM=="drivers" RUN+="/usr/bin/stop-nvidia-persistenced"

# Start and stop nvidia-persistenced when loading and unloading
# the driver
ACTION=="add" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/bin/start-nvidia-persistenced"
ACTION=="remove" DEVPATH=="/module/nvidia" SUBSYSTEM=="module" RUN+="/usr/bin/stop-nvidia-persistenced"

Should I disable that?

No, don’t disable that. But simply having the persistenced daemon running is not necessarily enough to confirm that persistence mode is enabled.

The persistenced daemon control/behavior is documented here:

http://docs.nvidia.com/deploy/driver-persistence/index.html#background

I really appreciate your help. I’m not sure what I’m missing though. I’ve tried manually running the nvidia-persistenced daemon with

nvidia-persistenced --user nvidia-persistenced --persistence-mode

Then, nvidia-smi -q shows that persistence mode is enabled for both GPUs. However, even though I have created the /dev/nvidia* files with mknod and started nvidia-persistenced, running nvidia-smi still resets the mode and group of the /dev/nvidia* files.

This is an Ubuntu 16.04.1 LTS machine by the way, in case it matters.

Are you running nvidia-smi as a root user?

No, I’m running as a non-root user, and it’s still replacing the /dev/nvidia* files.