System:
OS: Arch Linux
Tested Kernel(s): mainline (linux 6.11.5.arch1-1) & zen (linux-zen 6.11.5.zen1-1)
driver: using nvidia-dkms
(560.35.03)-18 from pacman
GPU: NVIDIA GeForce RTX 3080 Ti
CPU: AMD Ryzen 9 7900X (24) @ 5.73 GHz
Backgound
my kernel got updated to the given version and ever since then my system has been not waking up from systemctl suspend
. I get a black screen with everything not responding. So, I always had to force power off the system.
These errors occurred on listed zen and mainline kernel. I have now switched to the long-term support (LTS) kernel (6.6.58-1-lts) and these errors are gone. The system gracefully suspends and wakes/resumes but I would love to go back to the zen kernel.
Checked for Discrepancies in nvidia-suspend
and nvidia-resume
Services:
On digging deeper I found some discrepancies in logs from nvidia-suspend.service and nvidia-resume.service where I saw multiple logs from nvidia-suspend but no logs from nvidia-resume which indicates nvidia-resume
is not working properly.
-- Boot 02e5a821c167432c954cde8d80bf086a --
Oct 25 12:02:37 nebula systemd[1]: Starting NVIDIA system suspend actions...
Oct 25 12:02:38 nebula suspend[17078]: nvidia-suspend.service
Oct 25 12:02:38 nebula logger[17078]: <13>Oct 25 12:02:38 suspend: nvidia-suspend.service
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Deactivated successfully.
Oct 25 12:02:39 nebula systemd[1]: Finished NVIDIA system suspend actions.
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Consumed 923ms CPU time, 461.3M memory peak.
Kernel Logs:
[rogue@nebula ~]$ sudo journalctl -k -b -2 | grep -i nvidia
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input9
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input10
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input11
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input12
Oct 25 12:03:47 nebula kernel: nvidia: loading out-of-tree module taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module license 'NVIDIA' taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Oct 25 12:03:47 nebula kernel: nvidia: module license taints kernel.
Oct 25 12:03:48 nebula kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Oct 25 12:03:48 nebula kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Oct 25 12:03:48 nebula kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 560.35.03 Fri Aug 16 21:39:15 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 560.35.03 Fri Aug 16 21:21:48 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Oct 25 12:03:48 nebula kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Oct 25 12:03:49 nebula kernel: nvidia-uvm: Loaded the UVM driver, major device number 234.
Oct 25 12:03:49 nebula kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
Oct 25 12:03:49 nebula kernel: nvidia 0000:01:00.0: vgaarb: deactivate vga console
Oct 25 12:03:49 nebula kernel: fbcon: nvidia-drmdrmfb (fb0) is primary device
Oct 25 12:03:50 nebula kernel: nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P OE 6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel: nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P W OE 6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel: nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P W OE 6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel: nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]