GTX 970 hangs after hibernate/resume on KDE Neon (Ubuntu 20.04 based)

Hi,

My Linux desktop has Ryzen 1700X, GTX 970 (connected to Dell P2715Q via DisplayPort), and GTX 1080 (not connected to anything). Installed KDE Neon 5.23 a few days ago - everything seems working, except for hibernation. When I run “sudo pm-hibernate”, the machine successfully powers down, and after rebooting it goes through what looks like a successful Linux resume sequence, but then the screen goes blank and it just hangs. Can’t even ssh from another machine.

(I’m reasonably sure it’s related to the graphics cards, because if I try “pm-hibernate” from a single user boot (text mode) then it resumes successfully.)

Right now I have KDE Neon 5.23, kernel “5.11.0-41-generic” and driver version 495.29.05, but I tried different versions (460, 470, 495.44) and they all show the same issue. Also tried different kernel versions and slightly different distros (Kubuntu 20.04 and Linux Mint), still no luck.

The funny thing is, I was using vanilla Ubuntu 20.04 (installed last year) until recently and hibernation was working with it! Not sure what changed…

Attaching nvidia-bug-report.log.gz, though I’m not sure if it contains the relevant info. (Couldn’t run it after the problem because the machine hangs completely.)

nvidia-bug-report.log.gz (531.4 KB)

Please try disabling nvidia-suspend, nvidia-resume and nvidia-hibernate in systemd to check if this relates to vidmem restoration added in recent drivers.

Hi generix, thanks for the suggestion. Unfortunately, I think they’re already disabled - couldn’t find these names anywhere. I could only find nvidia-persistenced, as follows:

$ systemctl --type=service | grep nvidia
  nvidia-persistenced.service                                                               loaded active running NVIDIA Persistence Daemon                                       

$ systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-12-18 20:54:48 PST; 8min ago
   Main PID: 1062 (nvidia-persiste)
      Tasks: 1 (limit: 38339)
     Memory: 744.0K
     CGroup: /system.slice/nvidia-persistenced.service
             └─1062 /usr/bin/nvidia-persistenced --verbose

Dec 18 20:54:46 myhost nvidia-persistenced[1062]: Verbose syslog connection opened
Dec 18 20:54:46 myhost nvidia-persistenced[1062]: Started (1062)
Dec 18 20:54:46 myhost nvidia-persistenced[1062]: device 0000:27:00.0 - registered
Dec 18 20:54:47 myhost nvidia-persistenced[1062]: device 0000:27:00.0 - persistence mode enabled.
Dec 18 20:54:47 myhost nvidia-persistenced[1062]: device 0000:27:00.0 - NUMA memory onlined.
Dec 18 20:54:47 myhost nvidia-persistenced[1062]: device 0000:28:00.0 - registered
Dec 18 20:54:48 myhost nvidia-persistenced[1062]: device 0000:28:00.0 - persistence mode enabled.
Dec 18 20:54:48 myhost nvidia-persistenced[1062]: device 0000:28:00.0 - NUMA memory onlined.
Dec 18 20:54:48 myhost nvidia-persistenced[1062]: Local RPC services initialized
Dec 18 20:54:48 myhost systemd[1]: Started NVIDIA Persistence Daemon.

I tried disabling this with systemctl mask nvidia-persistenced.service, but still it didn’t help. :/

Can you ssh into the system while it’s hanging after resume?

Sorry for the late response - I managed to run nvidia-bug-report.sh by manually editing pm-hibernate script to run it after resume.

It seems like the system enters a very unstable state after resume - the screen goes blank, and it may successfully run a few commands (that I inserted into pm-hibernate) but the system eventually hangs very soon. I tried half a dozen times but could successfully generate the bug report file only once.

When the file was generated I also ran nvidia-smi and got this output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:27:00.0 Off |                  N/A |
|ERR!   46C    P0   ERR! / 170W |    588MiB /  4039MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:28:00.0 Off |                  N/A |
|  0%   37C    P8     9W / 215W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1125      G   /usr/lib/xorg/Xorg                211MiB |
|    0   N/A  N/A      1613      G   /usr/bin/kwin_x11                 217MiB |
|    0   N/A  N/A      1696      G   /usr/bin/plasmashell               56MiB |
|    0   N/A  N/A      1725      G   /usr/bin/nvidia-settings            0MiB |
|    0   N/A  N/A      2102      G   ...AAAAAAAAA= --shared-files       93MiB |
|    1   N/A  N/A      1125      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

nvidia-bug-report.log.192808.gz (727.8 KB)