nvidia-persistenced causing 60 second reboot delays

tompretto · November 21, 2018, 9:01pm

Hi,

I am running Ubuntu 16.04 with a K5000 graphics card.
I have an automated build process which uses DKMS to re-compile the Nvidia drivers into a new OS build every night.

Recently… well, some time since Sept 26th… upon issuing a reboot command the system ends the session and sends me to the login screen and then takes around 60 seconds to initiate a reboot. If I try to type, the system does not mirror any input to the screen and also does not allow login via ssh.

The system takes around 60 seconds to finally reboot when issued the reboot command but thereafter reboots at a normal pace.

After a 60-second delay, I receive the message that Nvidia-Persistenced has stopped. If I press ctrl-alt-delete before this message, I get a timer stating:
“A STOP job is running for NVIDIA Persistence Daemon (XXs / 1min 28s)”
where XX is a second counter.

dpkg --list outputs the following packages:

ii nvidia-375 384.130-0ubuntu0.16.04.1 amd64 Transitional package for nvidia-384
ii nvidia-384 384.130-0ubuntu0.16.04.1 amd64 NVIDIA binary driver - version 384.130

Is it possible to remove or lower this delay?

generix · November 21, 2018, 9:09pm

That’s systemd’s kill-timeout
in /etc/systemd/system.conf set

DefaultTimeoutStopSec=10s

to set it to 10 seconds.
Though it might be interesting why persistenced fails to shut down and has to be killed.

tompretto · November 26, 2018, 5:56pm

Thank you @generix

Changing that /etc/systemd/system.conf value does NOT seem to decrease the reboot-delay that I have been incurring recently, but I find it odd that all my previous OS builds have the 90-second default commented out in the same configuration file and did not express this issue.

Is there another setting which would cause persistenced to fail to shut down in a timely manner?

-Thomas

Updated comment to reflect the value does NOT decrease the delay
nvidia-bug-report.log.gz (217 KB)

generix · November 26, 2018, 6:06pm

Persistenced does not really have any settings besides run-as-user, verbose and no-persistence-mode which all should be irrelevant in your case. To find out why persistenced is failing to stop, please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]

tompretto · November 26, 2018, 6:40pm

Thank you once again @generix
I have uploaded the requested results file for your perusal.

tompretto · December 4, 2018, 4:23pm

At first I thought that this might be the correct answer but after rigorous testing it appears that changing this setting did not speed up my systems reboot process; I am still incurring a 60-second delay before the server reboots after sending a reboot command @generix

Would you happen to be able to point me at any other configurations I can check?

tompretto · December 4, 2018, 6:00pm

OK, I have been testing and am observing inconsistent results.

I have modified my /etc/systemd/system.conf so that it has a shorter timeout value (DefaultTimeoutStopSec=4s) but that setting does not seem to work 100% of the time. Some reboots appear to reboot after 4 seconds, while other reboots are still taking up to a minute. I have updated my post (Posted 11/26/2018 05:56 PM) with my observation that the set value does NOT appear to be working but now it seems as though it DOES work sometimes.

Sorry for the confusion, but this is a very odd issue to me.

generix · December 4, 2018, 10:27pm

I suspect a timeout of 4sec is too short to properly work. If you manually just stop the persistenced, is anything noticeable in journal/dmesg?
You could probably also just disable it since you have only one gpu and are running X on it.

tompretto · December 6, 2018, 3:58pm

Thank you for your timely assistance @generix

We were not sure if it was OK to disable the nvidia-persisitenced Daemon. I have disabled and masked the nvidia-persistenced daemon and added some network introspection to my startup scripts to wait until the network is available. The issue is remediated at this time.
Thanks again!

-Thomas

generix · December 6, 2018, 8:53pm

The persistence daemon is needed if no X is running or more than one gpu is present. So in your case you can safely disable it.

mikelojkovic · June 19, 2019, 6:37am

Is the persistence daemon needed for sli, or only when using two separate gpus without sli?

generix · June 19, 2019, 8:47am

When using separate gpus without X+sli or running headless with one gpu.