System freeze (but still reliably accessible via ssh) after resuming from hibernate with Xid 56

Hello. I started facing this issue around 384.90 and it also happens in 387.22.
Around half the time, if I hibernate while logged onto Gnome and a game is running, I will see Xid 56 errors on resume.

I did a google search for Xid 56. I found https://www.nvidia.com/Download/driverResults.aspx/118290/en-us which says “Fixed a bug that caused the system to become unresponsive after resuming from power management suspend/hibernate. Additional symptoms of this bug included display flickering and “Xid 56” errors in the kernel log.”
Perhaps this is still an issue on the 1050Ti GTX?

If I do not hibernate, the system is very stable even under heavy load and I can reboot only once a month for kernel updates.
nvidia-bug-report.log-hibernate-xid-issue.gz (168 KB)

I tried Unigine Heaven 4.0 benchmark system for over an hour. It brought the GPU temperature from mid 30s c up to ~59c and stayed there. I didn’t see any stability issues so it doesn’t look like a power draw issue.
Anyone has any idea?

Any Linux kernel experts here? I may have found a clue. If less than ~3.8GB out of 7.7GB of system ram are used, the hibernate issue doesn’t happen. It only happens when I have 50% or higher system memory usage.

I got a tip on irc that turning off zswap and loading nvidia-uvm on boot may help. I just did that and rebooted. Hibernating twice in a row at high mem usage didn’t cause any issues but I’ll wait a week or so before declaring this as fixed to be on the safe side.

Edit: It happened again. But it seems to happen less than 50% of the time now instead of every time. I think I will just exit Xorg before hibernating from on till a developer can take a look.

What I know so far:

  • turning on nvidia-persistenced causes xid 56 reliably after the second or third resume from hibernate. Turning nvidia-persistenced helped.
  • Resuming is not always stable if system memory usage was more than (system memory - vram) even though video memory is dedicated (pcie card).
  • switching tty or even restarting xorg after resuming from hibernate breaks ttys (this started with gnome 3.26). Even fast-user-switching (multiple Xorg sessions) breaks ttys. I am using text console (set gfxpayload=text).