My configuration: Opensuse Leap 15.2. with NVIDIA driver version 460.67 installed by YAST from the Nvidia graphics drivers repository.
After installing CUDA Toolkit v11.2.2 I tried to install the NVIDIA persistence demon as recommended in
under item 9.2.1. Following installation procedures outlined in
https://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-daemon
the NVIDIA supplied installer creates the required systemd script and a dedicated user “nvidia-persistenced” for the DEMON process to run. However, the install script ends with the error message
Error: 'systemctl start nvidia-persistenced.service' failed with
'Job for nvidia-persistenced.service failed because the control process exited with error code.'
I traced the error to the following problem: when one starts the demon using
systemctl start nvidia-persistenced.service
the actual command executed inside the nvidia-persistenced.service script is
/usr/bin/nvidia-persistenced --user nvidia-persistenced
The problem is that the user nvidia-persistenced created by the NVIDIA installer does not have the rights to access the devices /dev/nvidia*
Starting by hand using
sudo /usr/bin/nvidia-persistenced --verbose --user nvidia-persistenced
produces the following messages in the syslog:
2021-04-08T12:43:24.630123+02:00 localhost nvidia-persistenced: Verbose syslog connection opened
2021-04-08T12:43:24.630214+02:00 localhost nvidia-persistenced: Now running with user ID 460 and group ID 2001
2021-04-08T12:43:24.630272+02:00 localhost nvidia-persistenced: Started (5356)
2021-04-08T12:43:24.630341+02:00 localhost nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exis
t, and that user 460 has read and write permissions for those files.
2021-04-08T12:43:24.630403+02:00 localhost nvidia-persistenced: PID file unlocked.
2021-04-08T12:43:24.630454+02:00 localhost nvidia-persistenced: PID file closed.
2021-04-08T12:43:24.630496+02:00 localhost nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persi
stenced
2021-04-08T12:43:24.630522+02:00 localhost nvidia-persistenced: Shutdown (5356)
whereas if I start the demon and ask it to remain root user (not recommended by NVIDIA), things work just fine:
sudo /usr/bin/nvidia-persistenced --verbose
with the following syslog messages:
2021-04-08T12:48:22.200184+02:00 localhost nvidia-persistenced: Verbose syslog connection opened
2021-04-08T12:48:22.200260+02:00 localhost nvidia-persistenced: Started (5422)
2021-04-08T12:48:22.200519+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - registered
2021-04-08T12:48:22.200805+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - persistence mode enabled.
2021-04-08T12:48:22.200874+02:00 localhost nvidia-persistenced: device 0000:01:00.0 - NUMA memory onlined.
2021-04-08T12:48:22.200931+02:00 localhost nvidia-persistenced: Local RPC services initialized
Apparently the nvidia-persistenced process cannot be invoked as intended by NVIDIA. I wonder if I missed some step in the installation, however, I failed to find any information in the documentation links above.
Any idea how to proceed?