Issue with suspend/resume either failing or having graphical issues on wayland and Xorg on Fedora 35-Kernel 5.16.15-201 and NVidia 510.47.03

Hi all, I have been having significant issues with suspending/resuming on Linux. I have this problem both on Arch as well as Fedora 35 with the most updated kernel & NVidia drivers from RPM fusion.

Here are the details. On Windows 10 my computer (Ryzen 5600x, ASUS ROG Strix B550-A MB, and NVidia GTX 1660 ti) works fine. I have a dual monitor setup an LG 2k 144hz panel connected through the DP and an LG 4k 60hz panel connected through the HDMI. It suspends / resumes just fine.

In Linux is where all the issues are. For starters, the 4k @ 60hz panel in the HDMI only sometimes wakes from sleep. The only way for it to reliably both load initially and wake from sleep is to turn it down to 30hz and then bump it back up to 60hz. But that isn’t the significant issue. The big issue is the graphical issues when waking. I have issues with both Wayland and XOrg. I’ll start with my Wayland woes.

Wayland:
When using the Nouveau drivers it wakes/suspends okay. Once I installed the nvidia proprietary drivers, suspend simply fails. The monitors will turn off and go dark, but the CPU fan keeps spinning and then the monitors pop back on. I can provide the entire journalctl -b output but the relevant portion of it seems to be this:

Mar 20 10:47:45 fedora kernel: Freezing user space processes ... 
Mar 20 10:47:45 fedora kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
Mar 20 10:47:45 fedora kernel: task:gnome-shell     state:D stack:    0 pid: 1917 ppid:  1632 flags:0x00000004
Mar 20 10:47:45 fedora kernel: Call Trace:
Mar 20 10:47:45 fedora kernel:  <TASK>
Mar 20 10:47:45 fedora kernel:  __schedule+0x2d7/0xfa0
Mar 20 10:47:45 fedora kernel:  ? __cgroup_account_cputime+0x4c/0x70
Mar 20 10:47:45 fedora kernel:  schedule+0x4e/0xc0
Mar 20 10:47:45 fedora kernel:  rwsem_down_read_slowpath+0x310/0x350
Mar 20 10:47:45 fedora kernel:  nvkms_ioctl_from_kapi+0x27/0x90 [nvidia_modeset]
Mar 20 10:47:45 fedora kernel:  _nv000092kms+0x42/0x50 [nvidia_modeset]
Mar 20 10:47:45 fedora kernel:  ? nv_drm_framebuffer_destroy+0x3b/0x50 [nvidia_drm]
Mar 20 10:47:45 fedora kernel:  ? drm_mode_rmfb+0x188/0x1c0 [drm]
Mar 20 10:47:45 fedora kernel:  ? schedule+0x58/0xc0
Mar 20 10:47:45 fedora kernel:  ? futex_wait_queue+0x82/0xd0
Mar 20 10:47:45 fedora kernel:  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
Mar 20 10:47:45 fedora kernel:  ? drm_ioctl_kernel+0x8c/0x120 [drm]
Mar 20 10:47:45 fedora kernel:  ? drm_ioctl+0x220/0x3e0 [drm]
Mar 20 10:47:45 fedora kernel:  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
Mar 20 10:47:45 fedora kernel:  ? security_file_ioctl+0x32/0x50
Mar 20 10:47:45 fedora kernel:  ? __x64_sys_ioctl+0x82/0xb0
Mar 20 10:47:45 fedora kernel:  ? do_syscall_64+0x3b/0x90
Mar 20 10:47:45 fedora kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Mar 20 10:47:45 fedora kernel:  </TASK>
Mar 20 10:47:45 fedora kernel: 
Mar 20 10:47:45 fedora kernel: OOM killer enabled.
Mar 20 10:47:45 fedora kernel: Restarting tasks ... done.
Mar 20 10:47:45 fedora kernel: PM: suspend exit
Mar 20 10:47:45 fedora rtkit-daemon[916]: The canary thread is apparently starving. Taking action.
Mar 20 10:47:45 fedora systemd-sleep[3602]: Failed to put system to sleep. System resumed again: Device or resource busy

Switching to an Xorg log in. Suspend will work! But on resume, there are serious graphic glitches where the icons have no images. I can attach a picture if needsbe. I have to mostly blindly log out and log back in (doesn’t need a full restart) and the graphics will come back normally.

I found that there are supposed to be two settings for the nvidia kernel mod.
NVreg_PreserveVideoMemoryAllocations=1
NVreg_TemporaryFilePath=/var/tmp

The instructions say to do it in /etc/modprobe.d/nvidia.conf though that file doesn’t exist by default it would seem. I did modprobe -c | grep NV and both were already set!
It appears instead there is a file that is already installed under /lib/modprobe.d/nvidia-power-management.conf that set them.
sudo systemctl enable nvidia-{suspend,resume,hibernate} That only made one new symlink for resume, but suspend continued to fail.

Looking at the running services, it was odd that the nvidia-powerd service seemed to register as failed. Here is the output of systemctl status nvidia-powerd and then journalctl -b | grep nvidia-powerd

 × nvidia-powerd.service - nvidia-powerd service
      Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; vendor preset: enabled)
      Active: failed (Result: exit-code) since Sun 2022-03-20 05:06:35 PDT; 7h ago
     Process: 900 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
    Main PID: 900 (code=exited, status=1/FAILURE)
         CPU: 2ms
 
 Mar 20 05:06:35 fedora systemd[1]: Starting nvidia-powerd service...
 Mar 20 05:06:35 fedora /usr/bin/nvidia-powerd[900]: nvidia-powerd version:1.0(build 1)
 Mar 20 05:06:35 fedora /usr/bin/nvidia-powerd[900]: No matching GPU found
 Mar 20 05:06:35 fedora /usr/bin/nvidia-powerd[900]: Failed to initialize RM Client
 Mar 20 05:06:35 fedora systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
 Mar 20 05:06:35 fedora systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
 Mar 20 05:06:35 fedora systemd[1]: Failed to start nvidia-powerd service.

I came across a few reddit threads discussing others with this issue that has happened. I’m happy to link those if you’d like. They stated that the issue arose in the power package. I tried disabling nvidia-powerd.service as well as turning off NVreg_PreserveVideoMemoryAllocations=0but neither fixed the problem. I finally tried uninstalling the package xorg-x11-drv-nvidia-power

Now Xorg works! It resumes from sleep without glitches! However, now wayland though it will sleep, when it wakes up it’s the thing that now has all the terrible glitches. I guess you can’t win. I’m not sure what the bug is. I’m happy to post more. I did generate an nvidia bug report though this is with the package removed. Let me know if you’d like to to generate a report after re-installing the above package if it would help out. It’s a very annoying issue to be sure so happy to try to help out so someone else doesn’t have to deal with it too.

I have the same issue to this on arch though Fedora provided better log files and I didn’t want to have to go through the Arch install process again.

Thanks for your help/advice, would be great to use wayland if possible!
nvidia-bug-report.log.gz (104.4 KB)

2 Likes

On Arch/Gnome , 510.60.02-16.
Resuming from suspend to disk fails, both with x11 and wayland, only blackscreen visible.
Resuming from suspend to ram gives me colorful artifacts all over the screen on wayland, x11 works.
Resuming from suspend to both (hybrid sleep) → black screen with power supply on, works if power supply was off (x11, not tested with wayland)

Based on this other post, the graphical issues on resume has been found and will be fixed in a later release: Corrupted graphics upon resume (Gnome 41, X.org, 495.44 driver) - #12 by padamstx

Please test with latest beta driver release and share test results.
https://us.download.nvidia.com/XFree86/Linux-x86_64/515.43.04/NVIDIA-Linux-x86_64-515.43.04.run

It might be worth looking at the thread linked below, which concentrates on the cause of this reported error…

> Mar 20 10:47:45 fedora kernel: Freezing user space processes ... 
> Mar 20 10:47:45 fedora kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
> Mar 20 10:47:45 fedora kernel: task:gnome-shell     state:D stack:    0 pid: 1917 ppid:  1632 flags:0x00000004
> Mar 20 10:47:45 fedora kernel: Call Trace:

Explicitly stopping gnome-shell before suspend and restarting on restore does seem to help.

See Trouble suspending with 510.39.01, Linux 5.16.0: Freezing of tasks failed after 20.009 seconds - #10 by anvandare571

Neil