Trouble suspending with 510.39.01, Linux 5.16.0: Freezing of tasks failed after 20.009 seconds

  • Card: GeForce GTX 970
  • Kernel: 5.16.0
  • System manager: systemd 250.2
  • Greeter: GDM 41
  • Desktop Environment: Gnome 41 on Wayland

My computer sometimes fails to suspend, and I see this in dmesg:

Freezing user space processes ...
Freezing of tasks failed after 20.009 seconds (1 tasks refusing to freeze, wq_busy=0):
task:gnome-shell     state:D stack:    0 pid: 2080 ppid:  2019 flags:0x00004004
Call Trace:
 <TASK>
 __schedule+0x265/0x700
 schedule+0x49/0xd0
 rwsem_down_read_slowpath+0x315/0x360
 ? __kmalloc+0x1a4/0x2d0
 nvkms_ioctl_from_kapi+0x22/0x90 [nvidia_modeset]
 _nv002056kms+0x126c/0x2710 [nvidia_modeset]
 ? nv_drm_internal_framebuffer_create+0x24d/0x8b0 [nvidia_drm]
 ? nv_drm_exit+0x310/0x370 [nvidia_drm]
 ? drm_internal_framebuffer_create+0x3a8/0x4e0
 ? drm_mode_addfb2+0x2c/0xb0
 ? drm_mode_addfb_ioctl+0x10/0x10
 ? drm_ioctl_kernel+0xb1/0x140
 ? rm_ioctl+0x63/0xb0 [nvidia]
 ? drm_ioctl+0x225/0x410
 ? drm_mode_addfb_ioctl+0x10/0x10
 ? __x64_sys_futex+0x6e/0x1d0
 ? __x64_sys_ioctl+0x8d/0xb0
 ? do_syscall_64+0x38/0xc0
 ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>

The top of the stack is in the SLAB allocator, which is trying to obtain a semaphore. I have zswap enabled. I’m not a kernel hacker so that’s all I can tell from this.

I set CONFIG_DRM_I915=m in order to enable CONFIG_DRM_KMS_HELPER=m.

Kernel commandline:

BOOT_IMAGE=/vmlinuz-5.16.0-gentoo
loop.max_part=32
rd.lvm=0 rd.md=0 rd.dm=0
nvidia-drm.modeset=1
video=HDMI-1:1920x1080d
nomodeset
quiet
loglevel=0
slab_nomerge
init_on_alloc=1
init_on_free=1
page_alloc.shuffle=1
pti=on
vsyscall=none
mitigations=off

/etc/modprobe.d/nvidia.conf:

# NVIDIA drivers options
# See /usr/share/doc/nvidia-drivers-*/README.txt* for more information.

# nvidia-drivers and nouveau cannot be used at same time.
# Comment out the following line if you wish to allow nouveau.
blacklist nouveau

# Kernel Mode Setting (notably needed for EGLStream/Wayland)
# Enabling may possibly cause issues with SLI and Reverse PRIME.
options nvidia-drm modeset=1

# Suspend options. Allocations=0 recommended over =1 unless enable nvidia's
# systemd sleep services (nvidia-hibernate, nvidia-resume, nvidia-suspend).
options nvidia \
	NVreg_PreserveVideoMemoryAllocations=1 \
	NVreg_TemporaryFilePath=/var/tmp

# !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
	NVreg_DeviceFileGID=27 \
	NVreg_DeviceFileMode=432 \
	NVreg_DeviceFileUID=0 \
	NVreg_ModifyDeviceFiles=1

# Power save options
options nvidia \
	NVreg_DynamicPowerManagment=0x01

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia

Loaded modules:

Module                  Size  Used by
squashfs               53248  1
vfat                   20480  1
fat                    81920  1 vfat
rfkill                 24576  2
nft_ct                 20480  2
nf_conntrack           86016  1 nft_ct
nf_defrag_ipv6         20480  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
nft_objref             16384  1
nft_limit              16384  3
nft_counter            16384  1
nf_tables             237568  96 nft_ct,nft_objref,nft_counter,nft_limit
nfnetlink              20480  1 nf_tables
binfmt_misc            16384  1
snd_hda_codec_generic    81920  1
x86_pkg_temp_thermal    16384  0
ledtrig_audio          16384  1 snd_hda_codec_generic
snd_hda_codec_hdmi     65536  1
snd_hda_intel          32768  4
snd_intel_dspcfg       16384  1 snd_hda_intel
snd_hda_codec         114688  3 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel
snd_hwdep              16384  1 snd_hda_codec
snd_hda_core           65536  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec
snd_pcm               118784  4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core
snd_timer              36864  1 snd_pcm
snd                    90112  15 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_pcm
soundcore              16384  1 snd
video                  45056  0
tirdad                 16384  0
coretemp               16384  0
fuse                  135168  3
r8169                  98304  0
realtek                28672  1
mdio_devres            16384  1 r8169
libphy                102400  3 r8169,mdio_devres,realtek
xhci_pci               16384  0
xhci_hcd              163840  1 xhci_pci
nvidia_drm             61440  49
drm_kms_helper        233472  1 nvidia_drm
backlight              16384  2 video,drm_kms_helper
cfbfillrect            16384  1 drm_kms_helper
syscopyarea            16384  1 drm_kms_helper
cfbimgblt              16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
cfbcopyarea            16384  1 drm_kms_helper
nvidia_uvm           1085440  0
nvidia_modeset       1114112  7 nvidia_drm
nvidia              38453248  612 nvidia_uvm,nvidia_modeset
btrfs                1269760  1
libcrc32c              16384  2 btrfs,nf_tables
xor                    24576  1 btrfs
raid6_pq              118784  1 btrfs

cat /proc/driver/nvidia/params:

ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 432
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
EnableDbgBreakpoint: 0
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/var/tmp"
ExcludedGpus: ""

I will attach the bug report shortly. This could be something wrong with my configuration, though.
nvidia-bug-report.log.gz (310.3 KB)

Unrelated to the suspend problem, you have “nomodeset” in your kernel cmdline, this previously disabled nvidia-drm.modeset=1 so I’m astonished this is working at all. Rather remove it to not run into any trouble anytime later.
I suspect the vt switch in nvidia-sleep.sh is triggering the bug, so

  • does switching to vt and back (several times) trigger this also?
  • does disabling nvidia-suspend and nvidia-resume in systemd prevent this?
1 Like

I removed nomodeset and video= from the cmdline. It’s been over a week with no issues and no display corruption. Who knew :)

I’m seeing a similar suspend failure on Fedora 35, kernel 5.16.3, nvidia 510.39.01, using gnome on wayland, except that I don’t have nomodeset and video= in cmdline. How were you able to get around the suspend issue?

Disabling nvidia-suspend, nvidia-resume, and removing NVreg_PreserveVideoMemoryAllocations=1 seems to fix suspend but leads to artifacts on wake.

[  102.797810] PM: suspend entry (s2idle)
[  102.807629] Filesystems sync: 0.009 seconds
[  102.807804] Freezing user space processes ... 
[  122.808924] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[  122.808951] task:gnome-shell     state:D stack:    0 pid: 2932 ppid:  2623 flags:0x00000004
[  122.808959] Call Trace:
[  122.808961]  <TASK>
[  122.808967]  __schedule+0x2d6/0x10b0
[  122.808981]  schedule+0x4e/0xc0
[  122.808986]  rwsem_down_read_slowpath+0x310/0x350
[  122.808993]  nvkms_ioctl_from_kapi+0x27/0x90 [nvidia_modeset]
[  122.809036]  _nv000092kms+0x42/0x50 [nvidia_modeset]
[  122.809090]  ? nv_drm_framebuffer_destroy+0x3b/0x50 [nvidia_drm]
[  122.809099]  ? drm_mode_rmfb+0x188/0x1c0 [drm]
[  122.809149]  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
[  122.809196]  ? drm_ioctl_kernel+0x8c/0x120 [drm]
[  122.809237]  ? drm_ioctl+0x220/0x3e0 [drm]
[  122.809277]  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
[  122.809324]  ? do_unlinkat+0x13f/0x2b0
[  122.809332]  ? security_file_ioctl+0x32/0x50
[  122.809337]  ? __x64_sys_ioctl+0x82/0xb0
[  122.809341]  ? do_syscall_64+0x3b/0x90
[  122.809346]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.809352]  </TASK>

I spoke too soon; I still see these in dmesg, though suspend seems to work when I ask for it explicitly:

[87732.534435] Freezing user space processes ... 
[87752.537224] Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
[87752.537234] task:gnome-shell     state:D stack:    0 pid:2607064 ppid:2607002 flags:0x00000004
[87752.537238] Call Trace:
[87752.537239]  <TASK>
[87752.537241]  __schedule+0x265/0x700
[87752.537248]  ? find_busiest_group+0xeb/0xa60
[87752.537252]  schedule+0x49/0xd0
[87752.537254]  rwsem_down_read_slowpath+0x315/0x360
[87752.537258]  ? __kmalloc+0x1a4/0x2d0
[87752.537261]  nvkms_ioctl_from_kapi+0x22/0x90 [nvidia_modeset]
[87752.537275]  _nv002056kms+0x126c/0x2710 [nvidia_modeset]
[87752.537291]  ? nv_drm_internal_framebuffer_create+0x24d/0x8b0 [nvidia_drm]
[87752.537295]  ? nv_drm_exit+0x310/0x370 [nvidia_drm]
[87752.537298]  ? drm_internal_framebuffer_create+0x3a8/0x4e0
[87752.537301]  ? drm_mode_addfb2+0x2c/0xb0
[87752.537303]  ? drm_mode_addfb_ioctl+0x10/0x10
[87752.537305]  ? drm_ioctl_kernel+0xb1/0x140
[87752.537307]  ? rm_ioctl+0x63/0xb0 [nvidia]
[87752.537484]  ? drm_ioctl+0x225/0x410
[87752.537486]  ? drm_mode_addfb_ioctl+0x10/0x10
[87752.537488]  ? __x64_sys_futex+0x6e/0x1d0
[87752.537491]  ? __x64_sys_ioctl+0x8d/0xb0
[87752.537494]  ? do_syscall_64+0x38/0xc0
[87752.537496]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[87752.537499]  </TASK>

I would disable preserving video memory allocations but then my screen is unusable on wake.

Interestingly for me, it doesn’t actually suspend. It goes to s2idle (no video signal) and wakes back up after 20 seconds.

what if you systemctl isolate multi-user.target and systemctl suspend… assuming you’re on systemd

Same thing - it just tries to suspend for 20 seconds and then it wakes back up to the gnome login screen.

NVreg_EnableS0ixPowerManagement=1 works for me with Wayland session.

Had the same issue on Fedora 35 after upgrade to nvidia 510.47.03 and kernel 5.16.5.

This fixed it for me, working suspend/resume without graphics corruption:

  1. Uninstall the package “xorg-x11-drv-nvidia-power”.
  2. Reboot.
  3. Select GNOME as session during logon, not “GNOME on Wayland”.

I have this problem with 510.60.02 and Linux 5.17.3

I found a solution.

gnome-shell is trying to talk to the NVIDIA driver after it has already gone into suspend, so it can’t respond. Linux tries to freeze the task, but fails because gnome-shell is waiting for a response from the driver and can’t be frozen.

The solution is to manually suspend gnome-shell using the STOP signal before the NVIDIA driver goes to suspend. Then use the CONT signal on resume.

/usr/local/bin/suspend-gnome-shell.sh:

#!/bin/bash

case "$1" in
    suspend)
        killall -STOP gnome-shell
        ;;
    resume)
        killall -CONT gnome-shell
        ;;
esac

/etc/systemd/system/gnome-shell-suspend.service:

[Unit]
Description=Suspend gnome-shell
Before=systemd-suspend.service
Before=systemd-hibernate.service
Before=nvidia-suspend.service
Before=nvidia-hibernate.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/suspend-gnome-shell.sh suspend

[Install]
WantedBy=systemd-suspend.service
WantedBy=systemd-hibernate.service

/etc/systemd/system/gnome-shell-resume.service:

[Unit]
Description=Resume gnome-shell
After=systemd-suspend.service
After=systemd-hibernate.service
After=nvidia-resume.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/suspend-gnome-shell.sh resume

[Install]
WantedBy=systemd-suspend.service
WantedBy=systemd-hibernate.service

Then just enable the two new systemd units:

systemctl daemon-reload
systemctl enable gnome-shell-suspend
systemctl enable gnome-shell-resume

This should interrupt gnome-shell in time so it’s not trying to access the graphics hardware. It worked for me.

4 Likes

I tested your solution and worked perfectly. I think we just need to test with other suspend-related options enabled, like “NVreg_PreserveVideoMemoryAllocations” and “NVreg_EnableS0ixPowerManagement” to see if there are no conflicts. Maybe your solution could be implemented by distros or GNOME can tweak the code to avoid the problem you found.

This also seems to solve it on my end for Fedora 36 and a 3080! just make sure to sudo chmod +x on the user script for anyone trying this!

This solution very unfortunately breaks “Resume from Hibernation” in systems that have no support for S0ix - I have a new alderlake H670 motherboard that does not support S0ix and has no option to enable it in BIOS - as tested with Intel’s S0ix support testing script.

This workaround does enable Resume from Suspend to work which was nice however non-working hibernation is a show-stopper for me. The system would try to wake up from hibernation but it would fail with this error and the restart a new fresh session:

Jul 03 19:16:06 nahuatl kernel: PM: hibernation: Failed to load image, recovering.
Jul 03 19:16:06 nahuatl kernel: nvidia 0000:01:00.0: PM: failed to quiesce async: error -5
Jul 03 19:16:06 nahuatl kernel: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -5
Jul 03 19:16:06 nahuatl kernel: PM: pci_pm_freeze(): nv_pmops_freeze+0x0/0x20 [nvidia] returns -5
Jul 03 19:16:06 nahuatl kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the ‘Configuring Power Management Support’

I tested with driver 515, 510 and 470, this was in ubuntu 22.04 / wayland
The error related to “PreserveVideoMemoryAllocations module parameter is set” is weird, only happens when resuming from hibernate but not when resuming from Suspend. The nvidia power mgmt services were all loaded and enabled, with all 3 versions of the driver I tested.

I had to revert to xorg where both suspend-to-ram and hibernation work just fine. It’s disappointing considering all focus for development is on wayland and it’s been so for years. Nvidia team please step up and sort this out please, without requiring S0ix that’s not so commonly supported.

Interesting, and solves my suspend fails after 20 seconds problem with a slightly elderly but still useful NVIDIA GEForce GTX 750 Ti card and the 515.57 driver under Wayland on Fedora 36.
I’m not qualified to judge whether this is a fix or a workaround, as I don’t know enough about systemd, but I’m certainly pleased with the result and it should be known more widely. I hope my commenting helps draw attention to the original post.
Also, I’d be happy to help with further testing…

Thank you Devyn,

Neil

For me this fix works with “options nvidia NVreg_PreserveVideoMemoryAllocations=1”

I’m very happy to do further testing if people have suggestions. (I’d even be happy to try to reactivate hibernate and try it, at least on an experimental basis - for my use condition suspend and screens out provides the power saving that I want.)

Neil

BTW, what font is this - the letter forms are crisp and clear, but the brackets and braces ( [ { are all a bit close to square brace to my tired old eyes!

NVidia GTX 750 Ti, driver 515.57, fedora 36, kernel 5.18.11-lqx1.0.fc36.x86_64

For me this fix works with “options nvidia NVreg_PreserveVideoMemoryAllocations=1” NVreg_TemporaryFilePath=/var/tmp
also nvidia-{suspend,resume}.service enabled

OS: Ubuntu 22.04 LTS x86_64
Host: 20URS01L00 ThinkPad T15g Gen 1
Kernel: 5.17.0-1013-oem
GPU: NVIDIA GeForce RTX 2070 SUPER Mobile / Max-Q (dedicate graphics only)

As I didn’t see a pre-existing ticket, just to make sure the GNOME developers are aware of this I logged gnome-shell#5772 referencing this thread.

I haven’t tried the workaround but I’m experiencing the same issue on:

  • Fedora 36 Workstation
  • GNOME 42
  • Kernel 5.18.17
  • GTX 1070 :: 515.65.01