Trouble suspending with 510.39.01, Linux 5.16.0: Freezing of tasks failed after 20.009 seconds

  • Card: GeForce GTX 970
  • Kernel: 5.16.0
  • System manager: systemd 250.2
  • Greeter: GDM 41
  • Desktop Environment: Gnome 41 on Wayland

My computer sometimes fails to suspend, and I see this in dmesg:

Freezing user space processes ...
Freezing of tasks failed after 20.009 seconds (1 tasks refusing to freeze, wq_busy=0):
task:gnome-shell     state:D stack:    0 pid: 2080 ppid:  2019 flags:0x00004004
Call Trace:
 <TASK>
 __schedule+0x265/0x700
 schedule+0x49/0xd0
 rwsem_down_read_slowpath+0x315/0x360
 ? __kmalloc+0x1a4/0x2d0
 nvkms_ioctl_from_kapi+0x22/0x90 [nvidia_modeset]
 _nv002056kms+0x126c/0x2710 [nvidia_modeset]
 ? nv_drm_internal_framebuffer_create+0x24d/0x8b0 [nvidia_drm]
 ? nv_drm_exit+0x310/0x370 [nvidia_drm]
 ? drm_internal_framebuffer_create+0x3a8/0x4e0
 ? drm_mode_addfb2+0x2c/0xb0
 ? drm_mode_addfb_ioctl+0x10/0x10
 ? drm_ioctl_kernel+0xb1/0x140
 ? rm_ioctl+0x63/0xb0 [nvidia]
 ? drm_ioctl+0x225/0x410
 ? drm_mode_addfb_ioctl+0x10/0x10
 ? __x64_sys_futex+0x6e/0x1d0
 ? __x64_sys_ioctl+0x8d/0xb0
 ? do_syscall_64+0x38/0xc0
 ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>

The top of the stack is in the SLAB allocator, which is trying to obtain a semaphore. I have zswap enabled. I’m not a kernel hacker so that’s all I can tell from this.

I set CONFIG_DRM_I915=m in order to enable CONFIG_DRM_KMS_HELPER=m.

Kernel commandline:

BOOT_IMAGE=/vmlinuz-5.16.0-gentoo
loop.max_part=32
rd.lvm=0 rd.md=0 rd.dm=0
nvidia-drm.modeset=1
video=HDMI-1:1920x1080d
nomodeset
quiet
loglevel=0
slab_nomerge
init_on_alloc=1
init_on_free=1
page_alloc.shuffle=1
pti=on
vsyscall=none
mitigations=off

/etc/modprobe.d/nvidia.conf:

# NVIDIA drivers options
# See /usr/share/doc/nvidia-drivers-*/README.txt* for more information.

# nvidia-drivers and nouveau cannot be used at same time.
# Comment out the following line if you wish to allow nouveau.
blacklist nouveau

# Kernel Mode Setting (notably needed for EGLStream/Wayland)
# Enabling may possibly cause issues with SLI and Reverse PRIME.
options nvidia-drm modeset=1

# Suspend options. Allocations=0 recommended over =1 unless enable nvidia's
# systemd sleep services (nvidia-hibernate, nvidia-resume, nvidia-suspend).
options nvidia \
	NVreg_PreserveVideoMemoryAllocations=1 \
	NVreg_TemporaryFilePath=/var/tmp

# !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
	NVreg_DeviceFileGID=27 \
	NVreg_DeviceFileMode=432 \
	NVreg_DeviceFileUID=0 \
	NVreg_ModifyDeviceFiles=1

# Power save options
options nvidia \
	NVreg_DynamicPowerManagment=0x01

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia

Loaded modules:

Module                  Size  Used by
squashfs               53248  1
vfat                   20480  1
fat                    81920  1 vfat
rfkill                 24576  2
nft_ct                 20480  2
nf_conntrack           86016  1 nft_ct
nf_defrag_ipv6         20480  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
nft_objref             16384  1
nft_limit              16384  3
nft_counter            16384  1
nf_tables             237568  96 nft_ct,nft_objref,nft_counter,nft_limit
nfnetlink              20480  1 nf_tables
binfmt_misc            16384  1
snd_hda_codec_generic    81920  1
x86_pkg_temp_thermal    16384  0
ledtrig_audio          16384  1 snd_hda_codec_generic
snd_hda_codec_hdmi     65536  1
snd_hda_intel          32768  4
snd_intel_dspcfg       16384  1 snd_hda_intel
snd_hda_codec         114688  3 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel
snd_hwdep              16384  1 snd_hda_codec
snd_hda_core           65536  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec
snd_pcm               118784  4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core
snd_timer              36864  1 snd_pcm
snd                    90112  15 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_pcm
soundcore              16384  1 snd
video                  45056  0
tirdad                 16384  0
coretemp               16384  0
fuse                  135168  3
r8169                  98304  0
realtek                28672  1
mdio_devres            16384  1 r8169
libphy                102400  3 r8169,mdio_devres,realtek
xhci_pci               16384  0
xhci_hcd              163840  1 xhci_pci
nvidia_drm             61440  49
drm_kms_helper        233472  1 nvidia_drm
backlight              16384  2 video,drm_kms_helper
cfbfillrect            16384  1 drm_kms_helper
syscopyarea            16384  1 drm_kms_helper
cfbimgblt              16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
cfbcopyarea            16384  1 drm_kms_helper
nvidia_uvm           1085440  0
nvidia_modeset       1114112  7 nvidia_drm
nvidia              38453248  612 nvidia_uvm,nvidia_modeset
btrfs                1269760  1
libcrc32c              16384  2 btrfs,nf_tables
xor                    24576  1 btrfs
raid6_pq              118784  1 btrfs

cat /proc/driver/nvidia/params:

ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 432
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 18
EnableDbgBreakpoint: 0
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/var/tmp"
ExcludedGpus: ""

I will attach the bug report shortly. This could be something wrong with my configuration, though.
nvidia-bug-report.log.gz (310.3 KB)

Unrelated to the suspend problem, you have “nomodeset” in your kernel cmdline, this previously disabled nvidia-drm.modeset=1 so I’m astonished this is working at all. Rather remove it to not run into any trouble anytime later.
I suspect the vt switch in nvidia-sleep.sh is triggering the bug, so

  • does switching to vt and back (several times) trigger this also?
  • does disabling nvidia-suspend and nvidia-resume in systemd prevent this?
1 Like

I removed nomodeset and video= from the cmdline. It’s been over a week with no issues and no display corruption. Who knew :)

I’m seeing a similar suspend failure on Fedora 35, kernel 5.16.3, nvidia 510.39.01, using gnome on wayland, except that I don’t have nomodeset and video= in cmdline. How were you able to get around the suspend issue?

Disabling nvidia-suspend, nvidia-resume, and removing NVreg_PreserveVideoMemoryAllocations=1 seems to fix suspend but leads to artifacts on wake.

[  102.797810] PM: suspend entry (s2idle)
[  102.807629] Filesystems sync: 0.009 seconds
[  102.807804] Freezing user space processes ... 
[  122.808924] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[  122.808951] task:gnome-shell     state:D stack:    0 pid: 2932 ppid:  2623 flags:0x00000004
[  122.808959] Call Trace:
[  122.808961]  <TASK>
[  122.808967]  __schedule+0x2d6/0x10b0
[  122.808981]  schedule+0x4e/0xc0
[  122.808986]  rwsem_down_read_slowpath+0x310/0x350
[  122.808993]  nvkms_ioctl_from_kapi+0x27/0x90 [nvidia_modeset]
[  122.809036]  _nv000092kms+0x42/0x50 [nvidia_modeset]
[  122.809090]  ? nv_drm_framebuffer_destroy+0x3b/0x50 [nvidia_drm]
[  122.809099]  ? drm_mode_rmfb+0x188/0x1c0 [drm]
[  122.809149]  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
[  122.809196]  ? drm_ioctl_kernel+0x8c/0x120 [drm]
[  122.809237]  ? drm_ioctl+0x220/0x3e0 [drm]
[  122.809277]  ? drm_mode_rmfb+0x1c0/0x1c0 [drm]
[  122.809324]  ? do_unlinkat+0x13f/0x2b0
[  122.809332]  ? security_file_ioctl+0x32/0x50
[  122.809337]  ? __x64_sys_ioctl+0x82/0xb0
[  122.809341]  ? do_syscall_64+0x3b/0x90
[  122.809346]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.809352]  </TASK>

I spoke too soon; I still see these in dmesg, though suspend seems to work when I ask for it explicitly:

[87732.534435] Freezing user space processes ... 
[87752.537224] Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
[87752.537234] task:gnome-shell     state:D stack:    0 pid:2607064 ppid:2607002 flags:0x00000004
[87752.537238] Call Trace:
[87752.537239]  <TASK>
[87752.537241]  __schedule+0x265/0x700
[87752.537248]  ? find_busiest_group+0xeb/0xa60
[87752.537252]  schedule+0x49/0xd0
[87752.537254]  rwsem_down_read_slowpath+0x315/0x360
[87752.537258]  ? __kmalloc+0x1a4/0x2d0
[87752.537261]  nvkms_ioctl_from_kapi+0x22/0x90 [nvidia_modeset]
[87752.537275]  _nv002056kms+0x126c/0x2710 [nvidia_modeset]
[87752.537291]  ? nv_drm_internal_framebuffer_create+0x24d/0x8b0 [nvidia_drm]
[87752.537295]  ? nv_drm_exit+0x310/0x370 [nvidia_drm]
[87752.537298]  ? drm_internal_framebuffer_create+0x3a8/0x4e0
[87752.537301]  ? drm_mode_addfb2+0x2c/0xb0
[87752.537303]  ? drm_mode_addfb_ioctl+0x10/0x10
[87752.537305]  ? drm_ioctl_kernel+0xb1/0x140
[87752.537307]  ? rm_ioctl+0x63/0xb0 [nvidia]
[87752.537484]  ? drm_ioctl+0x225/0x410
[87752.537486]  ? drm_mode_addfb_ioctl+0x10/0x10
[87752.537488]  ? __x64_sys_futex+0x6e/0x1d0
[87752.537491]  ? __x64_sys_ioctl+0x8d/0xb0
[87752.537494]  ? do_syscall_64+0x38/0xc0
[87752.537496]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[87752.537499]  </TASK>

I would disable preserving video memory allocations but then my screen is unusable on wake.

Interestingly for me, it doesn’t actually suspend. It goes to s2idle (no video signal) and wakes back up after 20 seconds.

what if you systemctl isolate multi-user.target and systemctl suspend… assuming you’re on systemd

Same thing - it just tries to suspend for 20 seconds and then it wakes back up to the gnome login screen.

NVreg_EnableS0ixPowerManagement=1 works for me with Wayland session.

Had the same issue on Fedora 35 after upgrade to nvidia 510.47.03 and kernel 5.16.5.

This fixed it for me, working suspend/resume without graphics corruption:

  1. Uninstall the package “xorg-x11-drv-nvidia-power”.
  2. Reboot.
  3. Select GNOME as session during logon, not “GNOME on Wayland”.

I have this problem with 510.60.02 and Linux 5.17.3