Sleep intermittently fails on laptop with Ubuntu 23.10

I have a Lenovo Legion 7 Pro (16IRX8H) laptop running Kubuntu 23.10 (Ubuntu w/KDE). Intermittently, my laptop will fail to sleep.

Most of the time sleep works without issue, and I never have trouble resuming. It can happen on my first attempt to sleep or it can happen after consecutive successful sleeps. I haven’t found any usage patterns where I can consistently trigger this issue.

I install the drivers via the nvidia-driver-550 package and the full version I currently have is 550.54.14-0ubuntu0~gpu23.10.1. The problem also occurs with the nvidia-driver-545 package.

I’ve had this problem with the 6.6.x, 6.7.x, and 6.8.x series kernels.

The problem started 2-3 months ago or so so perhaps there was an update to both the 545 and 550 series that triggered this issue for me?

I’ve run KDE+Wayland for the entire time I’ve had this laptop, and I am utilizing optimus so it’s probably not a Wayland issue. But if someone suggests that it might be… I suppose I could switch back to X, redoing my desktop settings in the process.

nvidia-bug-report.log.gz (1.9 MB)

This is what I see in the logs:

2024-03-14T01:57:14.460498-07:00 hostname kernel: [19092.469851] INFO: task nvidia-sleep.sh:269177 blocked for more than 120 seconds.
2024-03-14T01:57:14.460509-07:00 hostname kernel: [19092.469873]       Tainted: P S   U     O       6.7.9-cb #12
2024-03-14T01:57:14.460510-07:00 hostname kernel: [19092.469882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2024-03-14T01:57:14.460511-07:00 hostname kernel: [19092.469893] task:nvidia-sleep.sh state:D stack:0     pid:269177 tgid:269177 ppid:1      flags:0x00004002
2024-03-14T01:57:14.460512-07:00 hostname kernel: [19092.469898] Call Trace:
2024-03-14T01:57:14.460512-07:00 hostname kernel: [19092.469900]  <TASK>
2024-03-14T01:57:14.460513-07:00 hostname kernel: [19092.469903]  __schedule+0x3f5/0x1620
2024-03-14T01:57:14.460514-07:00 hostname kernel: [19092.469912]  ? kfree+0x78/0x120
2024-03-14T01:57:14.460514-07:00 hostname kernel: [19092.469920]  ? _nv013879rm+0xb6/0x110 [nvidia]
2024-03-14T01:57:14.460515-07:00 hostname kernel: [19092.470540]  schedule+0x33/0x110
2024-03-14T01:57:14.460515-07:00 hostname kernel: [19092.470545]  schedule_preempt_disabled+0x15/0x30
2024-03-14T01:57:14.460516-07:00 hostname kernel: [19092.470548]  __mutex_lock.constprop.0+0x416/0x730
2024-03-14T01:57:14.460516-07:00 hostname kernel: [19092.470553]  __mutex_lock_slowpath+0x13/0x20
2024-03-14T01:57:14.460517-07:00 hostname kernel: [19092.470557]  mutex_lock+0x3c/0x50
2024-03-14T01:57:14.460518-07:00 hostname kernel: [19092.470560]  backlight_device_unregister.part.0+0x78/0xb0
2024-03-14T01:57:14.460518-07:00 hostname kernel: [19092.470566]  backlight_device_unregister+0x13/0x30
2024-03-14T01:57:14.460519-07:00 hostname kernel: [19092.470570]  nvkms_unregister_backlight+0x1b/0x30 [nvidia_modeset]
2024-03-14T01:57:14.460519-07:00 hostname kernel: [19092.470614]  _nv002814kms+0x21/0x40 [nvidia_modeset]
2024-03-14T01:57:14.460520-07:00 hostname kernel: [19092.470662]  _nv002549kms+0x91/0x210 [nvidia_modeset]
2024-03-14T01:57:14.460520-07:00 hostname kernel: [19092.470700]  nvKmsSuspend+0x68/0xa0 [nvidia_modeset]
2024-03-14T01:57:14.460521-07:00 hostname kernel: [19092.470736]  ? down_write+0x12/0x80
2024-03-14T01:57:14.460521-07:00 hostname kernel: [19092.470738]  nvkms_suspend+0x23/0x50 [nvidia_modeset]
2024-03-14T01:57:14.460522-07:00 hostname kernel: [19092.470772]  nvidia_modeset_suspend+0x1a/0x30 [nvidia]
2024-03-14T01:57:14.460522-07:00 hostname kernel: [19092.471052]  nv_set_system_power_state+0x158/0x480 [nvidia]
2024-03-14T01:57:14.460523-07:00 hostname kernel: [19092.471408]  nv_procfs_write_suspend+0x106/0x1c0 [nvidia]
2024-03-14T01:57:14.460523-07:00 hostname kernel: [19092.471681]  proc_reg_write+0x69/0xb0
2024-03-14T01:57:14.460523-07:00 hostname kernel: [19092.471687]  vfs_write+0xff/0x420
2024-03-14T01:57:14.460524-07:00 hostname kernel: [19092.471690]  ksys_write+0x73/0x100
2024-03-14T01:57:14.460524-07:00 hostname kernel: [19092.471693]  __x64_sys_write+0x19/0x30
2024-03-14T01:57:14.460525-07:00 hostname kernel: [19092.471695]  do_syscall_64+0x5d/0xf0
2024-03-14T01:57:14.460525-07:00 hostname kernel: [19092.471698]  ? exit_to_user_mode_prepare+0x35/0x160
2024-03-14T01:57:14.460526-07:00 hostname kernel: [19092.471703]  ? syscall_exit_to_user_mode+0x26/0x50
2024-03-14T01:57:14.460526-07:00 hostname kernel: [19092.471705]  ? __x64_sys_newfstatat+0x1c/0x30
2024-03-14T01:57:14.460527-07:00 hostname kernel: [19092.471708]  ? do_syscall_64+0x6c/0xf0
2024-03-14T01:57:14.460527-07:00 hostname kernel: [19092.471711]  ? do_syscall_64+0x6c/0xf0
2024-03-14T01:57:14.460528-07:00 hostname kernel: [19092.471713]  ? irqentry_exit_to_user_mode+0xe/0x20
2024-03-14T01:57:14.460528-07:00 hostname kernel: [19092.471715]  ? irqentry_exit+0x43/0x50
2024-03-14T01:57:14.460529-07:00 hostname kernel: [19092.471717]  ? exc_page_fault+0x7e/0x170
2024-03-14T01:57:14.460529-07:00 hostname kernel: [19092.471721]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
2024-03-14T01:57:14.460530-07:00 hostname kernel: [19092.471726] RIP: 0033:0x7eabcd51b294
2024-03-14T01:57:14.460530-07:00 hostname kernel: [19092.471729] RSP: 002b:00007ffc4e27e888 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
2024-03-14T01:57:14.460530-07:00 hostname kernel: [19092.471732] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007eabcd51b294
2024-03-14T01:57:14.460531-07:00 hostname kernel: [19092.471733] RDX: 0000000000000008 RSI: 000059afb02439f0 RDI: 0000000000000001
2024-03-14T01:57:14.460531-07:00 hostname kernel: [19092.471734] RBP: 000059afb02439f0 R08: ffffffffffffffc0 R09: 0000000000000410
2024-03-14T01:57:14.460532-07:00 hostname kernel: [19092.471735] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008
2024-03-14T01:57:14.460532-07:00 hostname kernel: [19092.471736] R13: 00007eabcd5ff7a0 R14: 00007eabcd5fd120 R15: 0000000000000000
2024-03-14T01:57:14.460533-07:00 hostname kernel: [19092.471740]  </TASK>

The backlight does stay on when sleeping fails, so maybe it’s an intermittent issue where the backlight can’t be turned off? But I think that’s supposed to be handled by the Intel i915 driver?

I gave this a few days to be sure… But it looks like the issue has been resolved. This used to happen at least once a day, and often several times a day.

I had added “i915.enable_guc=3” to the kernel command line a little ways back. Removing this has resolved the issue.

When the problem would occur, I did notice the backlight wouldn’t turn off, which is probably controlled by the Intel iGPU… So it could be that this is specifically an i915 bug while this feature enabled, and we’re seeing a stacktrace from the Nvidia driver due to the timeout of waiting for the backlight to turn off (which it never does) which the i915 driver never does.

Hopefully this will help others who have run into the same issue.