On Red Hat Enterprise Linux 7 (Workstation), we have observed an intermittent problem (but with consistent symptoms) affecting usage of the GNOME Display Manager’s “Switch User” functionality.
Consider the following procedure:
- Boot into RHEL7
- Log in using a smart card (US Department of Defense Common Access Card (CAC), specifically)
- Wait for the desktop environment to finish loading -- typically, we have seen this when using GNOME Classic, but we also have reports of it occurring for KDE users
- Remove the smart card, which (thanks to the smart card support provided by Centrify Infrastructure Services) causes the screen to lock
- Choose the "Log in as another user" option in the GDM login window
- Log back in as the same user, or as a different user, again using CAC
Sporadically, this procedure will cause the X server to crash (as identified by Xorg.N.log) and the screen to go black. However, GDM doesn’t restart, nor does the system drop to a tty, and the keyboard and mouse refuse to work at the local console. This is not a full system crash, however, because it is still possible to SSH into the machine.
This has happened numerous times, and I have included two nvidia-bug-report.log.gz files from two different occurrences in subsequent replies to this topic.
Perhaps it is helpful to consider the “timeline” for this problem:
- Problem first begins to be observed on machines with the 384.98 driver
- 2017-Nov: After a few occurrences, the 2017-11-23 bug report is captured (along with a core dump of the gsettings-data daemon)
- 2017-Dec: A few system changes are made, including the installation of the acpid software (for reasons unrelated to the X server crash)
- 2017-Dec: Sometime shortly before or after the above, a concerted effort is made to reproduce the bug: the procedure is repeated 50 times in a row but with no crash (where previous efforts to reproduce the bug required less than 10)
- 2017-Jan: NVidia driver updated to 384.111, and the bug continues to be hard to reproduce
- 2018-Feb: NVidia driver updated to 390.25
- 2018-Feb (late): Once again, the bug is reproduced and the 2018-02-22 bug report is captured
As you can see, there appears to be little correlation between the driver version and the bug’s reproducibility. Even the nvidia-bug-report.log.gz files seem to be somewhat different, with the first (from 2017-Nov) implicating the nvidia driver in the stack trace but the second (from 2018-Feb) lacking this connection. Can anybody help point us in the direction we should look next?
2017-11-23_nvidia-bug-report.log.gz (222 KB)
2018-02-22_nvidia-bug-report.log.gz (149 KB)