[575.64.03] Intermittent laptop suspend issues

I’m having intermittent suspend issues with my Lenovo Legion laptop. Sometimes it suspends and resumes fine, but frequently it fails to do so (e.g. screen goes blank, power light goes off, then I’m brought back to the lock screen and my laptop hasn’t suspended). In this last week I’ve taken to just shutting down the machine to prevent it from waking up in my backpack and opening it 30 minutes later to find it has been overheating. The issue occurs from both Wayland and X11 sessions.

Fedora 42 workstation edition, 6.15.6 kernel, nvidia driver version 575.64.03 from RPM Fusion.

ABRT collects the following or similar stack trace each time my laptop fails to suspend:

WARNING: CPU: 7 PID: 32226 at nvidia/nv.c:4660 nv_suspend_devices+0x2b6/0x300 [nvidia]
Modules linked in: binfmt_misc uinput rfcomm snd_seq_dummy snd_hrtimer nvidia_drm(O) nvidia_modeset(O) nvidia_uvm(O) nvidia(O) nfnetlink_queue nf_conntrack_netlink ip6t_REJECT nf_reject_ipv6 nft_chain_nat xt_nat nf_nat ipt_REJECT nf_reject_ipv4 xt_NFQUEUE xt_mark xt_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat sunrpc nf_tables qrtr snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_sof_board_helpers snd_sof_probes snd_soc_intel_hda_dsp_common bnep snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink vfat snd_sof_intel_hda fat snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi crc8 soundwire_bus snd_soc_sdca snd_soc_avs snd_soc_hda_codec snd_hda_ext_core
 iwlmvm snd_soc_core snd_compress ac97_bus intel_uncore_frequency intel_uncore_frequency_common snd_pcm_dmaengine intel_tcc_cooling snd_hda_intel mac80211 x86_pkg_temp_thermal snd_intel_dspcfg intel_powerclamp snd_intel_sdw_acpi snd_hda_codec coretemp snd_hda_core kvm_intel libarc4 uvcvideo btusb snd_hwdep processor_thermal_device_pci processor_thermal_device uvc btrtl snd_seq processor_thermal_wt_hint kvm videobuf2_vmalloc btintel snd_seq_device processor_thermal_rfim videobuf2_memops iwlwifi btbcm snd_pcm intel_rapl_msr processor_thermal_rapl videobuf2_v4l2 iTCO_wdt btmtk intel_rapl_common spi_nor videobuf2_common irqbypass intel_pmc_bxt processor_thermal_wt_req rapl mei_hdcp mei_pxp snd_timer ee1004 acer_wmi bluetooth cfg80211 iTCO_vendor_support mtd processor_thermal_power_floor videodev intel_cstate intel_pmc_core r8169 mei_me snd platform_profile i2c_i801 spi_intel_pci processor_thermal_mbox int3403_thermal int3400_thermal pmt_telemetry intel_uncore intel_hid mc pcspkr wmi_bmof soundcore mei realtek
 rfkill thunderbolt i2c_smbus spi_intel idma64 igen6_edac int340x_thermal_zone acpi_thermal_rel pmt_class sparse_keymap acpi_pad joydev loop nfnetlink zram lz4hc_compress lz4_compress xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm i915 nvme nvme_core nvme_keyring nvme_auth polyval_clmulni i2c_algo_bit polyval_generic drm_buddy ghash_clmulni_intel hid_multitouch ttm sha512_ssse3 ucsi_acpi drm_display_helper sha256_ssse3 typec_ucsi i2c_hid_acpi video sha1_ssse3 vmd cec intel_vsec typec i2c_hid wmi pinctrl_tigerlake serio_raw i2c_dev fuse
CPU: 7 UID: 0 PID: 32226 Comm: nvidia-sleep.sh Tainted: G        W  O        6.15.4-200.fc42.x86_64 #1 PREEMPT(lazy) 
Tainted: [W]=WARN, [O]=OOT_MODULE
Hardware name: Acer Aspire A715-51G/Metis_ADP, BIOS V1.52 09/12/2023
RIP: 0010:nv_suspend_devices+0x2b6/0x300 [nvidia]
Code: 48 8b 9b d0 06 00 00 48 85 db 0f 84 dd fe ff ff 48 8b bb 30 03 00 00 ba 01 00 00 00 44 89 f6 e8 c0 fb ff ff 89 c5 85 c0 74 d6 <0f> 0b 48 c7 c7 b0 19 67 c2 41 bc 01 00 00 00 e8 86 07 11 eb e9 fa
RSP: 0018:ffffd2a24eae7d80 EFLAGS: 00010202
RAX: 0000000000000051 RBX: ffff8ebc06fc0000 RCX: ffff8ebc06fc06b0
RDX: ffff8ebc06fc06b0 RSI: 0000000000000282 RDI: ffff8ebc06fc06a8
RBP: 0000000000000051 R08: fffffffe545d4a90 R09: 0000000000000014
R10: 0000000000000007 R11: 0000000000000000 R12: ffff8ebc06fc06a8
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f75fc4fc740(0000) GS:ffff8ebffd7f1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561f65625d90 CR3: 000000012d813005 CR4: 0000000000f72ef0
PKRU: 55555554
Call Trace:
 <TASK>
 nv_set_system_power_state+0x8a/0x1a0 [nvidia]
 nv_procfs_write_suspend+0x108/0x1d0 [nvidia]
 ? security_file_permission+0x50/0xf0
 proc_reg_write+0x57/0xb0
 vfs_write+0xef/0x470
 ? syscall_exit_to_user_mode+0x10/0x210
 ? do_syscall_64+0x87/0x160
 ? count_memcg_events.constprop.0+0x1a/0x30
 ksys_write+0x73/0xe0
 do_syscall_64+0x7b/0x160
 ? exc_page_fault+0x7e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f75fc56ca06
Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
RSP: 002b:00007ffc3cda8340 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f75fc56ca06
RDX: 0000000000000008 RSI: 000056434d14ba60 RDI: 0000000000000001
RBP: 00007ffc3cda8360 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008


R13: 000056434d14ba60 R14: 00007f75fc6e85c0 R15: 0000000000000000
 </TASK>

Also, occasionally when coming back from suspend, or even a fresh reboot, I get an error message in nvidia-smi (which is possibly relevant?) and cannot use the gpu at all without a reboot:

Unable to determine the device handle for GPU0: 0000:01:00.0: Unknown Error No devices were found

In my laptop uefi bios settings I’ve enabled dynamic graphics mode (e.g. both nvidia and intel). In discrete graphics mode (nvidia only), I’ve never been able to suspend since installing the nvidia driver. I have xorg-x11-drv-nvidia-power installed and have all the suspend and resume systemd services enabled nvidia-powerd, nvidia-resume, nvidia-suspend

I’m happy to provide more system information and logs as required.

Cheers.

nvidia-bug-report.log.gz (1.9 MB)