Same thing here with Pop!_OS 20.04 (which is based on Ubuntu 20.04) and GTX-970. It’s clear from the near identical stack traces that lots of users face the same problem and the extent of the issue is being disguised by the number of different threads opened about it.
For the people at NVIDIA who can’t repro the issue, you simply didn’t wait long enough in the suspend state during testing. Many times, I was able to convince myself that my latest attempt at a fix was working, only to be disappointed later on.
Also, please note that the recommended power management config is being applied by default in many cases. For example, the systemd units are enabled by default, and NVreg_PreserveVideoMemoryAllocations=1 is set in /usr/lib/modprobe.d/nvidia-graphics-drivers.conf, so reiterating this stuff in other config files is a waste of time.
Same issue here on Arch Linux after having to downgrade from 495 to 470 because my GPU is now considered legacy.
Fortunately the workaround posted by humblebee in that other thread seems to fix the problem for me.
TLDR: disable the nvidia-suspend
and nvidia-resume
systemd services.
I am not able to duplicate issue so far on below configuration setup, I kept system in suspend mode for overnight and display came up successfully post resume operation.
ASUSTeK COMPUTER INC P9X79 + Intel(R) Core™ i7-3820 CPU @ 3.60GHz + Ubuntu 20.04.1 LTS + 5.11.0-27-generic + Driver 470.57.02 + NVIDIA GeForce GTX 980 + LG Electronics LG ULTRAGEAR + BenQ EL2870U
I will spend few more cycles to try for repro.
Can the reason for your successful testing be related to nvidia-persistenced? I’ve just discovered that my fresh install of 20.04 with driver 470 contains a systemd unit file for nvidia-persistenced with no installation information (meaning that it cannot be enabled) and a command line that explicitly disables persistence mode via --no-persistence-mode. I wonder how many people are under the impression that they successfully enabled persistence mode (following advice online) when in fact it was disabled at the next reboot.
Unfortunately, even with nvidia-persistenced correctly configured (i.e. running, started on boot, and persistence mode enabled), I still cannot resume from suspend. Each time there is the familiar stack trace mentioning nv_procfs_write_suspend.
Dec 1 07:35:39 imhotep kernel: [478040.532558] Call Trace:
Dec 1 07:35:39 imhotep kernel: [478040.532561] nv_set_system_power_state+0x224/0x3c0 [nvidia]
Dec 1 07:35:39 imhotep kernel: [478040.532700] nv_procfs_write_suspend+0xe7/0x140 [nvidia]
Dec 1 07:35:39 imhotep kernel: [478040.532851] proc_reg_write+0x66/0x90
Dec 1 07:35:39 imhotep kernel: [478040.532854] vfs_write+0xb9/0x250
Dec 1 07:35:39 imhotep kernel: [478040.532857] ksys_write+0x67/0xe0
Dec 1 07:35:39 imhotep kernel: [478040.532859] __x64_sys_write+0x1a/0x20
Dec 1 07:35:39 imhotep kernel: [478040.532861] do_syscall_64+0x61/0xb0
Dec 1 07:35:39 imhotep kernel: [478040.532865] ? exit_to_user_mode_prepare+0x3d/0x1c0
Dec 1 07:35:39 imhotep kernel: [478040.532869] ? syscall_exit_to_user_mode+0x27/0x50
Dec 1 07:35:39 imhotep kernel: [478040.532870] ? __x64_sys_newfstat+0x16/0x20
Dec 1 07:35:39 imhotep kernel: [478040.532872] ? do_syscall_64+0x6e/0xb0
Dec 1 07:35:39 imhotep kernel: [478040.532874] ? exc_page_fault+0x8f/0x170
Dec 1 07:35:39 imhotep kernel: [478040.532876] ? asm_exc_page_fault+0x8/0x30
Dec 1 07:35:39 imhotep kernel: [478040.532878] entry_SYSCALL_64_after_hwframe+0x44/0xae
Is it possible that the issue affects GTX 970 but not GTX 980 for some reason?
Appears to be fixed after apt-get install --purge nvidia-driver-495, which has installed version 495.44. An earlier version of 495 didn’t seem to fix it.
@womagrid
Do you mean to say that driver 495.44 fixed issue on your setup ?
If not, can you please attach nvidia bug report once again from repro state.
Yes, that’s what I meant. It seems a little difficult to believe because the previous 495 didn’t fix it and also I didn’t see anything relevant in the changelog, but still it appears to be true.
One thing I noticed is that the systemd services (nvidia-suspend.service etc) are now disabled, but they are still included in the distribution. It occurs to me that this might confuse users into believing that they are still required and should be enabled.
Hi All,
Since womagrid is not facing issue with driver 495.44, can others please verify the same and confirm test results.
If it helps I can confirm this issue on ASUSTeK COMPUTER INC sabertooth X79 + Intel(R) Core™ i7-3930k cpu @ 3.20GHz + Kde Neon 5.23.4 (based on latest Ubuntu LTS) + 5.11.0-41-generic kernel + Driver 470.86 + NVIDIA GeForce GTX 980 + Asus Proart pa329 + displayport
Probably related to systemd_logind_vtenter is not called by xserver 21.1.2 (#1271) · Issues · xorg / xserver · GitLab
@womagrid
Can you also please confirm that you do not see the issue even when the systemd services are enabled.
Also not resuming on Ubuntu 21.10
$ sudo lspci -v | less
NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 36% 48C P0 45W / 148W | 682MiB / 4040MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
$ sudo systemctl status nvidia-suspend
○ nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
Active: inactive (dead)
$ sudo systemctl status nvidia-resume
○ nvidia-resume.service - NVIDIA system resume actions
Loaded: loaded (/lib/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Here the syslog:
syslog-nvidia-black-screen-suspend.txt (72.4 KB)
I didn’t really want to make the change in case it broke my setup again, but it seems that your 495.46 driver update has re-enabled the systemd units anyway. Thankfully, resume is still working.
Thanks womagrid for the positive feedback.
@user105657
Can you please check with driver 495.46 and confirm if it fixes issue on your setup.
If issue still persists, please share nvidia bug report from repro state and also confirm how are you doing suspend/resume on your system.
Resume from suspend in 495.46 is broken again here (possibly since the last kernel update). Also, it would be really good to have some clarity on what the “correct” configuration for nvidia suspend services is, instead of it randomly changes with each driver update.
I’m also affected by other bugs in this driver release, but they are not relevant to this thread.
Jan 13 22:03:54 imhotep kernel: [41586.714938] WARNING: CPU: 2 PID: 144953 at /var/lib/dkms/nvidia/495.46/build/nvidia/nv.c:4055 nv_restore_user_channels+0xce/0xe0 [nvidia]
Jan 13 22:03:54 imhotep kernel: [41586.715168] Modules linked in: cpuid snd_seq_dummy xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs overlay binfmt_misc intel_rapl_msr intel_rapl_common nvidia_uvm(POE) nvidia_drm(POE) x86_pkg_temp_thermal intel_powerclamp nvidia_modeset(POE) nls_iso8859_1 coretemp snd_soc_rt5640 snd_soc_rl6231 snd_soc_core nvidia(POE) kvm_intel mei_hdcp uvcvideo kvm videobuf2_vmalloc snd_hda_codec_hdmi snd_compress videobuf2_memops rapl videobuf2_v4l2 ac97_bus snd_hdspe(OE) videobuf2_common snd_hda_intel snd_pcm_dmaengine intel_cstate snd_usb_audio snd_intel_dspcfg pl2303 drm_kms_helper snd_intel_sdw_acpi snd_usbmidi_lib videodev snd_seq_midi usbserial joydev input_leds mc snd_hda_codec cec snd_seq_midi_event snd_hda_core rc_core snd_rawmidi snd_seq snd_hwdep fb_sys_fops syscopyarea
Jan 13 22:03:54 imhotep kernel: [41586.715219] usblp snd_pcm sysfillrect snd_seq_device at24 sysimgblt snd_timer mei_me snd mei soundcore mac_hid intel_smartconnect acpi_pad efi_pstore sch_fq_codel msr parport_pc ppdev lp parport drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress hid_generic usbhid hid dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear system76_io(OE) wmi system76_acpi(OE) uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel r8169 ahci xhci_pci crypto_simd i2c_i801 cryptd libahci lpc_ich i2c_smbus realtek xhci_pci_renesas video
Jan 13 22:03:54 imhotep kernel: [41586.715260] CPU: 2 PID: 144953 Comm: nvidia-sleep.sh Tainted: P OE 5.15.8-76051508-generic #202112141040~1639505278~20.04~0ede46a-Ubuntu
Jan 13 22:03:54 imhotep kernel: [41586.715263] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97M Anniversary, BIOS P2.20 03/08/2018
Jan 13 22:03:54 imhotep kernel: [41586.715264] RIP: 0010:nv_restore_user_channels+0xce/0xe0 [nvidia]
Jan 13 22:03:54 imhotep kernel: [41586.715488] Code: 32 92 d6 be 01 00 00 00 4c 89 ef e8 9c b4 00 00 48 89 df e8 f4 32 92 d6 ba 02 00 00 00 4c 89 ee 4c 89 e7 e8 44 95 94 00 eb 93 <0f> 0b eb c6 41 be 51 00 00 00 eb 9e 66 0f 1f 44 00 00 0f 1f 44 00
Jan 13 22:03:54 imhotep kernel: [41586.715490] RSP: 0018:ffffb73bc9eafd20 EFLAGS: 00010206
Jan 13 22:03:54 imhotep kernel: [41586.715492] RAX: 0000000000000003 RBX: ffff8a7b9853a000 RCX: ffffb73bc9eafcb8
Jan 13 22:03:54 imhotep kernel: [41586.715493] RDX: 0000000000000087 RSI: 0000000000000246 RDI: ffff8a7b830116a8
Jan 13 22:03:54 imhotep kernel: [41586.715494] RBP: ffffb73bc9eafd48 R08: 0000000000000000 R09: 0000000000000000
Jan 13 22:03:54 imhotep kernel: [41586.715496] R10: 0000000000000000 R11: ffff8a827ec31080 R12: ffff8a808a833000
Jan 13 22:03:54 imhotep kernel: [41586.715497] R13: ffff8a7b9853a000 R14: 0000000000000003 R15: ffff8a7b9853a528
Jan 13 22:03:54 imhotep kernel: [41586.715499] FS: 00007f76f8084740(0000) GS:ffff8a827ec80000(0000) knlGS:0000000000000000
Jan 13 22:03:54 imhotep kernel: [41586.715500] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 22:03:54 imhotep kernel: [41586.715502] CR2: 00007f41b4dfe010 CR3: 000000033eca4004 CR4: 00000000001706e0
Jan 13 22:03:54 imhotep kernel: [41586.715504] Call Trace:
Jan 13 22:03:54 imhotep kernel: [41586.715505] <TASK>
Jan 13 22:03:54 imhotep kernel: [41586.715508] nv_set_system_power_state+0x224/0x3c0 [nvidia]
Jan 13 22:03:54 imhotep kernel: [41586.715717] nv_procfs_write_suspend+0x101/0x180 [nvidia]
Jan 13 22:03:54 imhotep kernel: [41586.715921] proc_reg_write+0x66/0x90
Jan 13 22:03:54 imhotep kernel: [41586.715925] vfs_write+0xb9/0x260
Jan 13 22:03:54 imhotep kernel: [41586.715928] ksys_write+0x67/0xe0
Jan 13 22:03:54 imhotep kernel: [41586.715930] __x64_sys_write+0x1a/0x20
Jan 13 22:03:54 imhotep kernel: [41586.715933] do_syscall_64+0x5c/0xc0
Jan 13 22:03:54 imhotep kernel: [41586.715937] ? syscall_exit_to_user_mode+0x27/0x50
Jan 13 22:03:54 imhotep kernel: [41586.715940] ? __x64_sys_newfstat+0x16/0x20
Jan 13 22:03:54 imhotep kernel: [41586.715943] ? do_syscall_64+0x69/0xc0
Jan 13 22:03:54 imhotep kernel: [41586.715946] ? exit_to_user_mode_prepare+0x3d/0x1c0
Jan 13 22:03:54 imhotep kernel: [41586.715950] ? filp_close+0x60/0x70
Jan 13 22:03:54 imhotep kernel: [41586.715954] ? syscall_exit_to_user_mode+0x27/0x50
Jan 13 22:03:54 imhotep kernel: [41586.715956] ? __x64_sys_close+0x12/0x40
Jan 13 22:03:54 imhotep kernel: [41586.715958] ? do_syscall_64+0x69/0xc0
Jan 13 22:03:54 imhotep kernel: [41586.715961] ? asm_exc_page_fault+0x8/0x30
Jan 13 22:03:54 imhotep kernel: [41586.715965] entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan 13 22:03:54 imhotep kernel: [41586.715967] RIP: 0033:0x7f76f81981e7
Jan 13 22:03:54 imhotep kernel: [41586.715968] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
Jan 13 22:03:54 imhotep kernel: [41586.715970] RSP: 002b:00007ffce38cd588 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Jan 13 22:03:54 imhotep kernel: [41586.715971] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f76f81981e7
Jan 13 22:03:54 imhotep kernel: [41586.715972] RDX: 0000000000000007 RSI: 000055c4f39e6450 RDI: 0000000000000001
Jan 13 22:03:54 imhotep kernel: [41586.715973] RBP: 000055c4f39e6450 R08: 000000000000000a R09: 0000000000000006
Jan 13 22:03:54 imhotep kernel: [41586.715975] R10: 000055c4f303e017 R11: 0000000000000246 R12: 0000000000000007
Jan 13 22:03:54 imhotep kernel: [41586.715977] R13: 00007f76f82736a0 R14: 00007f76f82744a0 R15: 00007f76f82738a0
Jan 13 22:03:54 imhotep kernel: [41586.715979] </TASK>
Jan 13 22:03:54 imhotep kernel: [41586.715980] ---[ end trace 86f7ef18c3bb4884 ]---
@womagrid
please share nvidia bug report from repro state and also confirm how are you doing suspend/resume on your system.
I’m sorry but I don’t really know what you are asking. Also I’ve reverted to driver version 470.86 because it’s too painful and expensive (due to rising energy costs) to have resume from suspend always breaking. Life is too short.
I meant to capture bug report after facing issue from suspend/ resume operation which is generated by executing script “nvidia-bug-report.sh”
Also please confirm how are you performing suspend/resume operation ?
Are you using command line - systemctl suspend or GUI to suspend the system or any other method.