Nvidia-persistenced fails to start if user option is set to non-root user

My configuration: Opensuse Leap 15.2. with NVIDIA driver version 460.67 installed by YAST from the Nvidia graphics drivers repository.

After installing CUDA Toolkit v11.2.2 I tried to install the NVIDIA persistence demon as recommended in

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions

under item 9.2.1. Following installation procedures outlined in

https://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-daemon

the NVIDIA supplied installer creates the required systemd script and a dedicated user “nvidia-persistenced” for the DEMON process to run. However, the install script ends with the error message

Error: 'systemctl start nvidia-persistenced.service' failed with
'Job for nvidia-persistenced.service failed because the control process exited with error code.'

I traced the error to the following problem: when one starts the demon using

systemctl start nvidia-persistenced.service

the actual command executed inside the nvidia-persistenced.service script is

/usr/bin/nvidia-persistenced --user nvidia-persistenced

The problem is that the user nvidia-persistenced created by the NVIDIA installer does not have the rights to access the devices /dev/nvidia*

Starting by hand using

sudo /usr/bin/nvidia-persistenced --verbose --user nvidia-persistenced

produces the following messages in the syslog:

2021-04-08T12:43:24.630123+02:00 localhost nvidia-persistenced: Verbose syslog connection opened
2021-04-08T12:43:24.630214+02:00 localhost nvidia-persistenced: Now running with user ID 460 and group ID 2001
2021-04-08T12:43:24.630272+02:00 localhost nvidia-persistenced: Started (5356)
2021-04-08T12:43:24.630341+02:00 localhost nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exis
t, and that user 460 has read and write permissions for those files.
2021-04-08T12:43:24.630403+02:00 localhost nvidia-persistenced: PID file unlocked.
2021-04-08T12:43:24.630454+02:00 localhost nvidia-persistenced: PID file closed.
2021-04-08T12:43:24.630496+02:00 localhost nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persi
stenced
2021-04-08T12:43:24.630522+02:00 localhost nvidia-persistenced: Shutdown (5356)

whereas if I start the demon and ask it to remain root user (not recommended by NVIDIA), things work just fine:

sudo /usr/bin/nvidia-persistenced --verbose

with the following syslog messages:

2021-04-08T12:48:22.200184+02:00 localhost nvidia-persistenced: Verbose syslog connection opened
2021-04-08T12:48:22.200260+02:00 localhost nvidia-persistenced: Started (5422)
2021-04-08T12:48:22.200519+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - registered
2021-04-08T12:48:22.200805+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - persistence mode enabled.
2021-04-08T12:48:22.200874+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - NUMA memory onlined.
2021-04-08T12:48:22.200931+02:00 localhost nvidia-persistenced: Local RPC services initialized

Apparently the nvidia-persistenced process cannot be invoked as intended by NVIDIA. I wonder if I missed some step in the installation, however, I failed to find any information in the documentation links above.

Any idea how to proceed?

Please check which group has access to the nvidia dev files
ls -l /dev/nvidia*
and add the nvidia-persistenced user to it.

Sorry I forgot to mention: I had your idea already and checked before I submitted my initial question.

On my system I get

localhost [534] $ ls -l /dev/nvidia*
crw-rw----+ 1 root video 195,   0  8. Apr 13:06 /dev/nvidia0
crw-rw----+ 1 root video 195, 255  8. Apr 13:06 /dev/nvidiactl
crw-rw----+ 1 root video 195, 254  8. Apr 13:06 /dev/nvidia-modeset
crw-rw----+ 1 root video 238,   0  8. Apr 13:06 /dev/nvidia-uvm
crw-rw----+ 1 root video 238,   1  8. Apr 13:06 /dev/nvidia-uvm-tools
localhost [535] $

and for user nvidia-persistenced as created by the NVIDIA install script:

localhost [537] $ id nvidia-persistenced
uid=460(nvidia-persistenced) gid=2001(nvidia-persistenced) groups=482(video),2001(nvidia-persistenced)
localhost [538] $

That still did not work. However, the syslog indicates that the demon is started for user “nvidia-persistenced” setting its group id to the primary group “nvidia-persistenced” which is created as default by the NVIDIA installer script distributed in the tar ball

/usr/share/doc/packages/x11-video-nvidiaG05/nvidia-persistenced-init.tar.bz2

I tried to set users group “video” as primary group of user nvidia-persistenced and this time it worked. Consequently the installer script distributed by NVIDIA is buggy and should be modified accordingly.

Not really. The driver module sets its permissions to root:root 0666 per default, so the default install of nvidia-persistenced init-files is working correctly. If your OS’s repo changes the permissions, the installer has to be used with option -g
https://download.nvidia.com/XFree86/Linux-x86_64/430.50/README/faq.html