System Fails to Wake from Suspend on Nvidia Driver 560.35.03 with Kernel 6.11.5 (Mainline & Zen) - Works on LTS Kernel

System:
OS: Arch Linux
Tested Kernel(s): mainline (linux 6.11.5.arch1-1) & zen (linux-zen 6.11.5.zen1-1)
driver: using nvidia-dkms (560.35.03)-18 from pacman
GPU: NVIDIA GeForce RTX 3080 Ti
CPU: AMD Ryzen 9 7900X (24) @ 5.73 GHz

Backgound
my kernel got updated to the given version and ever since then my system has been not waking up from systemctl suspend. I get a black screen with everything not responding. So, I always had to force power off the system.

These errors occurred on listed zen and mainline kernel. I have now switched to the long-term support (LTS) kernel (6.6.58-1-lts) and these errors are gone. The system gracefully suspends and wakes/resumes but I would love to go back to the zen kernel.

Checked for Discrepancies in nvidia-suspend and nvidia-resume Services:
On digging deeper I found some discrepancies in logs from nvidia-suspend.service and nvidia-resume.service where I saw multiple logs from nvidia-suspend but no logs from nvidia-resume which indicates nvidia-resume is not working properly.

-- Boot 02e5a821c167432c954cde8d80bf086a --
Oct 25 12:02:37 nebula systemd[1]: Starting NVIDIA system suspend actions...
Oct 25 12:02:38 nebula suspend[17078]: nvidia-suspend.service
Oct 25 12:02:38 nebula logger[17078]: <13>Oct 25 12:02:38 suspend: nvidia-suspend.service
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Deactivated successfully.
Oct 25 12:02:39 nebula systemd[1]: Finished NVIDIA system suspend actions.
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Consumed 923ms CPU time, 461.3M memory peak.

Kernel Logs:

[rogue@nebula ~]$ sudo journalctl -k -b -2 | grep -i nvidia
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input9
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input10
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input11
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input12
Oct 25 12:03:47 nebula kernel: nvidia: loading out-of-tree module taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module license 'NVIDIA' taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Oct 25 12:03:47 nebula kernel: nvidia: module license taints kernel.
Oct 25 12:03:48 nebula kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Oct 25 12:03:48 nebula kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Oct 25 12:03:48 nebula kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  560.35.03  Fri Aug 16 21:21:48 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Oct 25 12:03:48 nebula kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Oct 25 12:03:49 nebula kernel: nvidia-uvm: Loaded the UVM driver, major device number 234.
Oct 25 12:03:49 nebula kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
Oct 25 12:03:49 nebula kernel: nvidia 0000:01:00.0: vgaarb: deactivate vga console
Oct 25 12:03:49 nebula kernel: fbcon: nvidia-drmdrmfb (fb0) is primary device
Oct 25 12:03:50 nebula kernel: nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P           OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P        W  OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P        W  OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
1 Like

Hello, I’ve got similar error. My system wakes up from suspend, but after some time of work the GPU is going down for sleep again. The same issue on 565.57.01. No matter if was a proprietary or GPL core, I tried both.
nvidia-bug-report.log.gz (1.3 MB)

OS: Fedora 40
Kernel: 6.11.4-201.fc40.x86_64
GUI: Gnome Shell 46
Driver installation method: sh script from nvidia page
GPU: RTX 4070 Ti SUPER

Issue doesn’t exists on stable 550.120 and 550.127 driver.

1 Like

I’m experiencing exactly the same error! Disabling Bluetooth helped! Suspend/Hibernate works again!
Very strange…

1 Like

This issue is really annoying (I’m having this since 560.xx now as well on 565.57.01).

I tried disabling bluetooth, but that didn’t work for me (and even so, it shouldn’t crash when bluetooth is enabled).

Btw. I’m on the LTS kernel (6.6.58)