System Fails to Wake from Suspend on Nvidia Driver 560.35.03 with Kernel 6.11.5 (Mainline & Zen) - Works on LTS Kernel

System:
OS: Arch Linux
Tested Kernel(s): mainline (linux 6.11.5.arch1-1) & zen (linux-zen 6.11.5.zen1-1)
driver: using nvidia-dkms (560.35.03)-18 from pacman
GPU: NVIDIA GeForce RTX 3080 Ti
CPU: AMD Ryzen 9 7900X (24) @ 5.73 GHz

Backgound
my kernel got updated to the given version and ever since then my system has been not waking up from systemctl suspend. I get a black screen with everything not responding. So, I always had to force power off the system.

These errors occurred on listed zen and mainline kernel. I have now switched to the long-term support (LTS) kernel (6.6.58-1-lts) and these errors are gone. The system gracefully suspends and wakes/resumes but I would love to go back to the zen kernel.

Checked for Discrepancies in nvidia-suspend and nvidia-resume Services:
On digging deeper I found some discrepancies in logs from nvidia-suspend.service and nvidia-resume.service where I saw multiple logs from nvidia-suspend but no logs from nvidia-resume which indicates nvidia-resume is not working properly.

-- Boot 02e5a821c167432c954cde8d80bf086a --
Oct 25 12:02:37 nebula systemd[1]: Starting NVIDIA system suspend actions...
Oct 25 12:02:38 nebula suspend[17078]: nvidia-suspend.service
Oct 25 12:02:38 nebula logger[17078]: <13>Oct 25 12:02:38 suspend: nvidia-suspend.service
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Deactivated successfully.
Oct 25 12:02:39 nebula systemd[1]: Finished NVIDIA system suspend actions.
Oct 25 12:02:39 nebula systemd[1]: nvidia-suspend.service: Consumed 923ms CPU time, 461.3M memory peak.

Kernel Logs:

[rogue@nebula ~]$ sudo journalctl -k -b -2 | grep -i nvidia
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input9
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input10
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input11
Oct 25 12:03:47 nebula kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input12
Oct 25 12:03:47 nebula kernel: nvidia: loading out-of-tree module taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module license 'NVIDIA' taints kernel.
Oct 25 12:03:47 nebula kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Oct 25 12:03:47 nebula kernel: nvidia: module license taints kernel.
Oct 25 12:03:48 nebula kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Oct 25 12:03:48 nebula kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Oct 25 12:03:48 nebula kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  560.35.03  Fri Aug 16 21:21:48 UTC 2024
Oct 25 12:03:48 nebula kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Oct 25 12:03:48 nebula kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Oct 25 12:03:49 nebula kernel: nvidia-uvm: Loaded the UVM driver, major device number 234.
Oct 25 12:03:49 nebula kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
Oct 25 12:03:49 nebula kernel: nvidia 0000:01:00.0: vgaarb: deactivate vga console
Oct 25 12:03:49 nebula kernel: fbcon: nvidia-drmdrmfb (fb0) is primary device
Oct 25 12:03:50 nebula kernel: nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P           OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P        W  OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel: Modules linked in: vfat fat uhid rfcomm algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat bridge stp llc nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c cmac ccm snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_device mousedev mc snd_ctl_led overlay bnep nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 nvidia(POE) btusb btrtl mac80211 btintel snd_hda_codec_realtek btbcm libarc4 kvm_amd btmtk snd_hda_codec_generic cfg80211 bluetooth joydev snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi polyval_generic r8169 ghash_clmulni_intel snd_hda_codec sha512_ssse3 realtek sha256_ssse3 mdio_devres snd_hda_core sha1_ssse3 snd_hwdep libphy hid_generic rfkill
Oct 25 12:17:24 nebula kernel: CPU: 1 UID: 0 PID: 23170 Comm: nvidia-sleep.sh Tainted: P        W  OE      6.11.5-zen1-1-zen #1 1400000003000000474e55005d86575806fe1c0a
Oct 25 12:17:24 nebula kernel:  nv_revoke_gpu_mappings_locked+0x47/0x70 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_set_system_power_state+0x269/0x580 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
Oct 25 12:17:24 nebula kernel:  nv_procfs_write_suspend+0x1a5/0x280 [nvidia 1400000003000000474e5500faf16c6315f3e93a]
1 Like

I’m experiencing exactly the same error! Disabling Bluetooth helped! Suspend/Hibernate works again!
Very strange…

1 Like

This issue is really annoying (I’m having this since 560.xx now as well on 565.57.01).

I tried disabling bluetooth, but that didn’t work for me (and even so, it shouldn’t crash when bluetooth is enabled).

Btw. I’m on the LTS kernel (6.6.58)

You do not have modeset=1 and PreserveVideoMemoryAllocations=1, are you using Xorg only?

modeset:N
fbdev:N
PreserveVideoMemoryAllocations: 0

I’m also starting to think that something is up with GDM and having both wayland and Xorg session. I managed to get everything working on my workstation by going back to basic and not installing the Xorg session and cleaning up my dracut configs. I’m also installing the nvidia drivers manually.

I made this short script for myself to speed up debugging,

#!/bin/bash

echo "### Kernel command line"
cat /proc/cmdline
echo

echo "### Nvidia driver version"
cat /proc/driver/nvidia/version
echo

echo "### loaded nvidia modules"
lsmod | grep nvidia
echo

echo "### Modules parameters"
echo Module: nvidia
cat /proc/driver/nvidia/params
echo

declare -a modules=("nvidia_drm" "nvidia_modset" "nvidia_uvm")
for module in "${modules[@]}"; do
  echo "$module:"
  if [ -d "/sys/module/$module/parameters" ]; then
    ls /sys/module/$module/parameters/ | while read param; do
      echo -n "    $param: "
      cat /sys/module/$module/parameters/$param
    done
  fi
  echo
done

echo "### Nvidia services status"
declare -a services=("nvidia-hibernate" "nvidia-resume" "nvidia-suspend" "nvidia-persistenced" "nvidia-powerd")
echo Services:
for service in "${services[@]}"; do
  echo "    $service: $(systemctl is-enabled ${service})"
done
echo

echo "### Nvidia services status"
if command -v rpm &> /dev/null; then
	echo "$(rpm -qa | grep  -E 'gnome-session|xorg')"
fi

Yes, I have a Xorg only. It’s still possible currently. From 8th Dec my system is running on kernel 6.11.11 with nvidia beta driver 565.77 and there is no issue so far!

Reset things are caused by GNOME itself. About wayland session: I have access to multiple computers, but only with mine desktop performance was horrible, so I stick with Xorg for now. In the next Fedora release they probably force to use wayland session, so we will see how it’s goes.

Thank you NVIDIA for fixing this issue!

Thanks for posting this!

I noticed uninstalling bluez fixes this issue for me. So I created two systemd service units for a workaround fix.

One of them unloads the wireless/bluetooth device’s kernel module before suspend and the other one loads the wireless device’s kernel modules on wake. I’ll post my systemd service files here.

NOTE: make sure to edit out mt7921e with the kernel module your device is using.

create file: /etc/systemd/system/remove-mt7921e-before-suspend.service

[Unit]
Description=Remove mt7921e module before suspend
Before=sleep.target suspend.target

[Service]
Type=oneshot
ExecStart=/usr/bin/modprobe -r mt7921e

[Install]
WantedBy=sleep.target suspend.target
[Unit]
Description=Remove mt7921e module before suspend
Before=sleep.target suspend.target

[Service]
Type=oneshot
ExecStart=/usr/bin/modprobe -r mt7921e

[Install]
WantedBy=sleep.target suspend.target

create file: /etc/systemd/system/add-mt7921e-after-wake.service

[Unit]
Description=Add mt7921e module after waking up
After=suspend.target sleep.target

[Service]
Type=oneshot
ExecStart=/usr/bin/modprobe mt7921e

[Install]
WantedBy=suspend.target sleep.target

make sure to do a sudo systemctl daemon-reload and then enable the services by running:
sudo systemctl enable remove-mt7921e-before-suspend.service
sudo systemctl enable add-mt7921e-after-wake.service

I hope this fixes the issues for you as well. I have reached out to my motherboard manufacturer (ASRock X670E Pro RS) reporting this BUG related to the onboard wireless device (MediaTek 7921e) hoping they’ll push out updates to fix this.

PS: I still have some issues with my RTX 3080ti on Hyprland (wayland) but they are unrelated to this issue now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.