[FIXED] Suspend / Resume issues with the driver version 470

humblebee · August 20, 2021, 11:01am

It seems like the driver version 470 causes a kernel panic upon resuming from suspend, display gets no input signal and keyboard stops responding to input after a few seconds from resuming ( when pressing NumLock, the status lights on the keyboard do not change!) .

I have temporarily downgraded to version 460 until this gets fixed. Logs are attached for both the 470 and 460 ( fully working) versions down below.

nvidia-bug-report.log.470.gz (203.8 KB)

nvidia-bug-report.log.460.gz (234.5 KB)

Machine specs:
OS: Kubuntu 21.04
Kernel: Mainline 5.13.12 (also tried with stock 5.11.xx from Ubuntu)
GPU: GeForce GT 710

YAFU · August 20, 2021, 1:59pm

Hi. You are not the only one having this problem. I hope developers can solve the problem soon:

humblebee · August 21, 2021, 9:25am

Good news for affected users! I found a fix!

A LITTLE BACKGROUND
You may already know that NVIDIA drivers on Linux rely on either of two different methods for power management ( as described here ), which include:

Kernel Driver Callback: Works out of the box with no configuration required, but lacks advanced power management features and preserves only a portion of the video memory.
systemd (/proc/driver/nvidia/suspend): Provides advanced power management features and preserves complete video memory, but requires configuration and setup.

THE CAUSE
Having mentioned the above, upon further inspection I found out the 470 driver migrated to systemd method while previous versions relied on Kernel Driver Callback. Apparently this is broken on some setups and kernels.

THE WORKAROUND
Now it’s obvious we have to revert back to Kernel Driver Callback method for now that the systemd method is broken, and here’s how you can do that:

Disable NVIDIA systemd services

sudo systemctl stop nvidia-suspend.service
sudo systemctl stop nvidia-hibernate.service
sudo systemctl stop nvidia-resume.service

sudo systemctl disable nvidia-suspend.service
sudo systemctl disable nvidia-hibernate.service
sudo systemctl disable nvidia-resume.service

Remove NVIDIA systemd script

sudo rm /lib/systemd/system-sleep/nvidia

Reboot and you should be able to suspend and resume properly with driver version 470.xx.

NOTE: Backup your configuration just in case, or downgrade the driver if this does not work on your setup. This was tested on Kubuntu 21.04 with GeForce GT 710.

KHTeh · September 26, 2021, 11:24am

I face the same issue. My laptop could wake up. However, the primary monitor is very dark, which is equivalent to being unusable at all. Secondary monitory is ok.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 21.04
Release:	21.04
Codename:	hirsute

$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.63.01  Tue Aug  3 20:44:16 UTC 2021
GCC version:

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd000025B8sv000017AAsd000022DEbc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-460 - distro non-free
driver   : nvidia-driver-470 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

dmesg:

[  130.074778] RIP: 0010:nv_drm_master_set+0x27/0x30 [nvidia_drm]
[  130.074783] Code: 90 b5 df 0f 1f 44 00 00 55 48 8b 47 48 48 8b 78 20 48 8b 05 bb 6c 00 00 48 89 e5 48 8b 40 28 e8 ef ef f1 df 84 c0 74 02 5d c3 <0f> 0b 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56
[  130.074785] RSP: 0018:ffff9ee947133b80 EFLAGS: 00010246
[  130.074788] RAX: 0000000000000000 RBX: ffff8c4c5d289200 RCX: 0000000000000008
[  130.074789] RDX: ffffffffc3a3ced8 RSI: 0000000000000292 RDI: ffffffffc3a3cea0
[  130.074791] RBP: ffff9ee947133b80 R08: 0000000000000008 R09: ffff9ee947133b68
[  130.074792] R10: 0000000000000000 R11: ffff8c4b9360991a R12: ffff8c4bdc352900
[  130.074793] R13: ffff8c4b88239800 R14: 0000000000000000 R15: ffff8c4b88239800
[  130.074795] FS:  00007f4e70044c80(0000) GS:ffff8c5c97680000(0000) knlGS:0000000000000000
[  130.074796] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  130.074798] CR2: 00007f4e7074c3ea CR3: 000000019b78a004 CR4: 0000000000770ee0
[  130.074800] PKRU: 55555554
[  130.074801] Call Trace:
[  130.074802]  drm_new_set_master+0x7e/0x100 [drm]
[  130.074822]  drm_master_open+0x6e/0xa0 [drm]
[  130.074842]  drm_open+0xf8/0x250 [drm]
[  130.074863]  drm_stub_open+0xba/0x140 [drm]
[  130.074887]  chrdev_open+0xf7/0x220
[  130.074891]  ? cdev_device_add+0x90/0x90
[  130.074894]  do_dentry_open+0x156/0x370
[  130.074899]  vfs_open+0x2d/0x30
[  130.074904]  do_open+0x1c3/0x340
[  130.074907]  path_openat+0x10a/0x1d0
[  130.074910]  ? psi_group_change+0x42/0x220
[  130.074913]  do_filp_open+0x8c/0x130
[  130.074917]  ? __check_object_size+0x1c/0x20
[  130.074920]  do_sys_openat2+0x9b/0x150
[  130.074925]  __x64_sys_openat+0x56/0x90
[  130.074929]  do_syscall_64+0x38/0x90
[  130.074931]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  130.074935] RIP: 0033:0x7f4e704de8db
[  130.074937] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
[  130.074939] RSP: 002b:00007ffd8b6e8f10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  130.074942] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4e704de8db
[  130.074944] RDX: 0000000000000002 RSI: 00007ffd8b6e8fe0 RDI: 00000000ffffff9c
[  130.074945] RBP: 00007ffd8b6e8fe0 R08: 0000000000000000 R09: 00007ffd8b6e8e20
[  130.074946] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  130.074948] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  130.074950] ---[ end trace 891fb0926d35dabc ]---

humblebee · October 1, 2021, 4:58pm

I can think of two possible options that may resolve this issue:

Enabling NVIDIA KMS:

echo options nvidia_drm modeset=1 | sudo tee -a /etc/modprobe.d/nvidia-kms.conf

Generating a new xorg.conf while both displays are plugged in and working through with NVIDIA X Server utility. Maybe fiddle with HardDPMS option in xorg.conf see if anything changes. (More info search for “HardDPMS”)

Reboots are required after each change.

KHTeh · October 2, 2021, 5:14am

Stack trace in demsg when options nvidia_drm modeset=1 is configured:

[  195.862426] Call Trace:
[  195.862427]  drm_new_set_master+0x7e/0x100 [drm]
[  195.862449]  drm_master_open+0x6e/0xa0 [drm]
[  195.862471]  drm_open+0xf8/0x250 [drm]
[  195.862494]  drm_stub_open+0xba/0x140 [drm]
[  195.862520]  chrdev_open+0xf7/0x220
[  195.862524]  ? cdev_device_add+0x90/0x90
[  195.862527]  do_dentry_open+0x156/0x370
[  195.862531]  vfs_open+0x2d/0x30
[  195.862535]  do_open+0x1c3/0x340
[  195.862538]  path_openat+0x10a/0x1d0
[  195.862541]  ? psi_group_change+0x42/0x220
[  195.862544]  do_filp_open+0x8c/0x130
[  195.862549]  ? __check_object_size+0x1c/0x20
[  195.862552]  do_sys_openat2+0x9b/0x150
[  195.862556]  __x64_sys_openat+0x56/0x90
[  195.862560]  do_syscall_64+0x38/0x90
[  195.862563]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  195.862567] RIP: 0033:0x7ff62edd08db
[  195.862569] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
[  195.862571] RSP: 002b:00007ffd86c25810 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  195.862573] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff62edd08db
[  195.862575] RDX: 0000000000000002 RSI: 00007ffd86c258e0 RDI: 00000000ffffff9c
[  195.862576] RBP: 00007ffd86c258e0 R08: 0000000000000000 R09: 00007ffd86c25720
[  195.862578] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  195.862579] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  195.862582] ---[ end trace af495a49ff84593d ]---
``
What's the default and/or recommended for this option? I didn't check before making the change.

humblebee · October 3, 2021, 7:36am

I’m out of ideas! Maybe creating a new topic will get you more help.

nadeemm · October 12, 2021, 7:18pm

@humblebee
Please do spin up a new topic, since this one has been tagged as being solved (you solved it :) )

Please repeat any details you feel may be relevant.
I have some more volunteers to help on base Linux issues - so hopefully we can help you faster this time around, but a new topic helps a ton - thanks!

KHTeh · October 15, 2021, 1:54am

@humblebee What does systemctl start nvidia-<foo>.service mean? Does it put the system into the desired state immediately? All the nvidia service on my Ubuntu 21.04 are in “enabeld” and “inactive (dead)” status, except the nvidia-persistence.service which is “active (running)”. Running `systemctl start nvidia-.suspend.service hangs my system.

aplattner · October 15, 2021, 7:02am

Those systemd units are not intended to be started manually, but rather as part of the systemd-suspend.service’s life cycle, which is used to do the low-level work of the systemctl suspend command.

The nvidia-suspend.service unit signals to the NVIDIA driver that it should suspend application access to the GPU, evict the contents of the GPU’s video memory, and get the GPU ready for system suspend. If the system doesn’t actually suspend (e.g. because you started that service manually rather than relying on systemctl suspend to do it for you) then it just wedges anything that tries to access the GPU until you manually start the corresponding nvidia-resume.service.

nadeemm · October 16, 2021, 8:00am

This topic was automatically closed after 3 days. New replies are no longer allowed.

Topic		Replies	Views
Black screen when resuming systemctl-suspend, using nvidia-driver-470.57.02 with kernel 5.8.0-63-generic on GTX 970, xubuntu 20.04 LTS Linux	67	26598	February 17, 2022
Resume issue after suspend Ubuntu 20.04 Linux	3	3835	November 12, 2021
Suspend / Resume issues with the driver version 470 doesn't work on my laptop Linux kernel , ubuntu , driver	5	1431	October 15, 2021
NVIDIA 470.63.01 driver randomly hangs with no video output when resuming from suspend using the /proc interface on GeForce GTX 960 Linux	7	1684	March 9, 2022
NVIDIA 470.82 locks up on suspend Linux	4	1910	November 26, 2021
Kernel 5.6: system freeze when resuming from suspend or hibernate Linux	28	7491	October 12, 2021
Issues resuming from hybrid-sleep Linux	9	2956	June 13, 2024
System Fails to Wake from Suspend on Nvidia Driver 560.35.03 with Kernel 6.11.5 (Mainline & Zen) - Works on LTS Kernel Linux kernel , linux	6	875	January 1, 2025
resume from suspend freezes system (GTX 970, Arch Linux, Kernel 4.4/4.7, NVIDIA 370) Linux	171	58181	June 18, 2017
GPU is crashing after resume from sleep Linux	3	3463	January 30, 2021

[FIXED] Suspend / Resume issues with the driver version 470

Related topics