systems-logind and X crash on resuming from suspend ; goes into failsafe mode

Sporadically, my session will crash after resuming from suspend. According to journalctl, first systemd-logind crashes

systemd-logind.service: Main process exited, code=dumped, status=6/ABRT

then, when opening a new session, X fails to start which causes sddm to crash

systemd-logind[17674]: New session 3 of user yann.
sddm[1250]: Adding new display on vt 1 ...
sddm[1250]: Loading theme configuration from ""
sddm[1250]: Display server starting...
sddm[1250]: Running: /usr/bin/X -nolisten tcp -auth /var/run/sddm/{76c955c5-7732-4e88-8e7f-13344b89817b} -background none -noreset -displayfd 19 -seat seat0 vt1
sddm[1250]: Display server failed to start. Exiting
systemd[1]: sddm.service: Main process exited, code=dumped, status=6/ABRT
org.kde.KScreen[1442]: The X11 connection broke (error 1). Did the X11 server die?
org.kde.kglobalaccel[1442]: The X11 connection broke (error 1). Did the X11 server die?

Then, X failsafe troubleshooting mode is activated

systemd[1]: Started X.org diagnosis failsafe.

nvidia-bug-report.log.gz (1.02 MB)
journalctl.log (855 KB)

During resume, your time is fluctuating, jumping back and forth. When going back in time, logind gets killed by watchdog. Might be connected with your RTC. Don’t know how to fix that, though.

Thanks a lot for your insight. I was assuming that these discrepancies were happening because events were not flushed in order. It seems that this is related to known problems with systemd services and is related to a kernel bug where the monotonic clock is not behaving as documented (ie. “moving” during suspend when it should not).

Following this comment from a systemd dev, I could confirm that my kernel (4.17.0-996-generic) does exhibit this anomalous behaviour, using the following Python script

import time
import sched
import datetime

s = sched.scheduler()

def f(sc, oldm) :
    newm = time.monotonic()
    print(datetime.datetime.now(), newm, newm-oldm)
    s.enter(1, 1, f, (sc,newm))
    
s.enter(1, 1, f, (s,time.monotonic()))
s.run()

This gives the following output

2018-08-10 13:38:48.589790 52173.236259955 1.0012640730055864
2018-08-10 13:38:49.591200 52174.237669549 1.0014095939986873
2018-08-10 13:38:50.592567 52175.239039243 1.0013696940004593
2018-08-10 13:39:48.812431 52233.458903577 58.21986433399434   # should not be, as I understand
2018-08-10 13:39:49.813574 52234.460033817 1.0011302400016575
2018-08-10 13:39:50.814721 52235.461193401 1.0011595840041991
2018-08-10 13:39:51.815394 52236.461865038 1.00067163699714

This seems to have been corrected in 4.17.14-041714-generic

2018-08-10 13:58:33.088250 188.802942035 1.0012386180000021
2018-08-10 13:58:34.089645 189.804337831 1.001395795999997
2018-08-10 13:58:35.090780 190.805475312 1.0011374810000007
2018-08-10 13:59:56.874722 194.244714143 3.4392388310000115 # due to script being delayed while resuming ?
2018-08-10 13:59:57.875806 195.245799462 1.0010853189999978
2018-08-10 13:59:58.876867 196.24686063 1.0010611679999784
2018-08-10 13:59:59.877985 197.247977342 1.0011167120000266

I will see if crashes happen again.

Nice find, thanks for reporting back.