Hi,
I’m not using my nvidia cards for display output so X11 is not using them. Standby was working fine on Debian bookworm with the ancient driver they ship. However after an update to Kernel 6.6.1 and driver 535.129.03 the machine no longer suspends, but starts to suspend then crashes essentially becoming unusable until the power button is held.
The following crash is logged in the journal. Nvidia bug report is attached, but I’m not sure how much useful info it will have.
nvidia-bug-report.log.gz (1.1 MB)
Edit: I can replicate the kernel crash just by running echo “suspend” > /proc/driver/nvidia/suspend
Edit2: This is only happening if X11 is running (with an iGPU). If I never sta5rt X11, it suspends fine. But if I start X11, then nvidia driver crashes when suspend command is sent to it. Same is happening if the console is switched to a text VT. Can anyone help? It seems obvious nvidia driver can’t understand a situation where X11 is running , but it is not controlling it.
Nov 26 19:19:36 host445 kernel: BUG: unable to handle page fault for address: 00000000000036c8
Nov 26 19:19:36 host445 kernel: #PF: supervisor write access in kernel mode
Nov 26 19:19:36 host445 kernel: #PF: error_code(0x0002) - not-present page
Nov 26 19:19:36 host445 kernel: PGD 0 P4D 0
Nov 26 19:19:36 host445 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 26 19:19:36 host445 kernel: CPU: 28 PID: 7778 Comm: nvidia-sleep.sh Tainted: P OE 6.6.1-custom #1
Nov 26 19:19:36 host445 kernel: Hardware name: ASUS System Product Name/ProArt X670E-CREATOR WIFI, BIOS 1710 10/04/2023
Nov 26 19:19:36 host445 kernel: RIP: 0010:_nv000740kms+0xc8/0x160 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: Code: 89 de 4c 89 ff 41 89 c5 e8 05 cc ff ff ba 01 00 00 00 48 89 de 4c 89 ff 41 09 c5 e8 f2 cb ff ff 44 08 e8 74 05 e8 28 ef ff ff 83 c8 36 00>
Nov 26 19:19:36 host445 kernel: RSP: 0018:ffffafd04fa2f9e8 EFLAGS: 00010246
Nov 26 19:19:36 host445 kernel: RAX: ffffffffc59ae3c8 RBX: 0000000000000000 RCX: 0000000000000000
Nov 26 19:19:36 host445 kernel: RDX: ffffafd04048f008 RSI: 0000000000000000 RDI: 0000000000000001
Nov 26 19:19:36 host445 kernel: RBP: ffffafd04fa2fa38 R08: ffffffffc59ae3c0 R09: 0000000000000001
Nov 26 19:19:36 host445 kernel: R10: 0000000000000000 R11: ffff88d46ab10008 R12: 0000000000000000
Nov 26 19:19:36 host445 kernel: R13: 0000000000000000 R14: ffff88d540c9a5a8 R15: ffffafd040d95008
Nov 26 19:19:36 host445 kernel: FS: 00007f176c60a740(0000) GS:ffff88eafe700000(0000) knlGS:0000000000000000
Nov 26 19:19:36 host445 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 19:19:36 host445 kernel: CR2: 00000000000036c8 CR3: 000000029b954000 CR4: 0000000000750ee0
Nov 26 19:19:36 host445 kernel: PKRU: 55555554
Nov 26 19:19:36 host445 kernel: Call Trace:
Nov 26 19:19:36 host445 kernel:
Nov 26 19:19:36 host445 kernel: ? __die+0x1f/0x70
Nov 26 19:19:36 host445 kernel: ? page_fault_oops+0x17d/0x4c0
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? exc_page_fault+0x73/0x170
Nov 26 19:19:36 host445 kernel: ? asm_exc_page_fault+0x22/0x30
Nov 26 19:19:36 host445 kernel: ? _nv000740kms+0xc8/0x160 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? _nv000740kms+0x83/0x160 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? _nv002437kms+0xf0/0x180 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? _nv002760kms+0x3b80/0x4c40 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? _nv002775kms+0x18b/0x1f0 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? nvKmsSuspend+0x3a/0x90 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? down+0x1a/0x60
Nov 26 19:19:36 host445 kernel: ? nvkms_suspend+0x1f/0x40 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: ? nv_set_system_power_state+0x174/0x440 [nvidia]
Nov 26 19:19:36 host445 kernel: ? nv_procfs_write_suspend+0xe4/0x150 [nvidia]
Nov 26 19:19:36 host445 kernel: ? proc_reg_write+0x56/0xa0
Nov 26 19:19:36 host445 kernel: ? preempt_count_add+0x47/0xa0
Nov 26 19:19:36 host445 kernel: ? vfs_write+0xeb/0x440
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? fpregs_assert_state_consistent+0x22/0x50
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? exit_to_user_mode_prepare+0x40/0x1d0
ov 26 19:19:36 host445 kernel: ? ksys_write+0x6b/0xf0
Nov 26 19:19:36 host445 kernel: ? do_syscall_64+0x58/0xc0
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? do_user_addr_fault+0x318/0x650
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? fpregs_assert_state_consistent+0x22/0x50
Nov 26 19:19:36 host445 kernel: ? srso_alias_return_thunk+0x5/0x7f
Nov 26 19:19:36 host445 kernel: ? exit_to_user_mode_prepare+0x40/0x1d0
Nov 26 19:19:36 host445 kernel: ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 26 19:19:36 host445 kernel:
Nov 26 19:19:36 host445 kernel: Modules linked in: nvidia_uvm(POE) nct6775 nct6775_core hwmon_vid rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_conntrack nft_chain_nat>
Nov 26 19:19:36 host445 kernel: battery ecc sha512_generic snd_timer ledtrig_audio cfg80211 crc16 joydev apple_mfi_fastcharge sparse_keymap aesni_intel ucsi_acpi snd platform_prof>
Nov 26 19:19:36 host445 kernel: CR2: 00000000000036c8
Nov 26 19:19:36 host445 kernel: —[ end trace 0000000000000000 ]—
Nov 26 19:19:36 host445 kernel: RIP: 0010:_nv000740kms+0xc8/0x160 [nvidia_modeset]
Nov 26 19:19:36 host445 kernel: Code: 89 de 4c 89 ff 41 89 c5 e8 05 cc ff ff ba 01 00 00 00 48 89 de 4c 89 ff 41 09 c5 e8 f2 cb ff ff 44 08 e8 74 05 e8 28 ef ff ff 83 c8 36 00>
Nov 26 19:19:36 host445 kernel: RSP: 0018:ffffafd04fa2f9e8 EFLAGS: 00010246
Nov 26 19:19:36 host445 kernel: RAX: ffffffffc59ae3c8 RBX: 0000000000000000 RCX: 0000000000000000
Nov 26 19:19:36 host445 kernel: RDX: ffffafd04048f008 RSI: 0000000000000000 RDI: 0000000000000001
Nov 26 19:19:36 host445 kernel: RBP: ffffafd04fa2fa38 R08: ffffffffc59ae3c0 R09: 0000000000000001
Nov 26 19:19:36 host445 kernel: R10: 0000000000000000 R11: ffff88d46ab10008 R12: 0000000000000000
Nov 26 19:19:36 host445 kernel: R13: 0000000000000000 R14: ffff88d540c9a5a8 R15: ffffafd040d95008
Nov 26 19:19:36 host445 kernel: FS: 00007f176c60a740(0000) GS:ffff88eafe700000(0000) knlGS:0000000000000000
Nov 26 19:19:36 host445 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 26 19:19:36 host445 kernel: CR2: 00000000000036c8 CR3: 000000029b954000 CR4: 0000000000750ee0
Nov 26 19:19:36 host445 kernel: PKRU: 55555554
Nov 26 19:19:36 host445 kernel: note: nvidia-sleep.sh[7778] exited with irqs disabled
Nov 26 19:19:36 host445 systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=9/KILL
Nov 26 19:19:36 host445 systemd[1]: nvidia-suspend.service: Failed with result ‘signal’.
Nov 26 19:19:36 host445 systemd[1]: Failed to start nvidia-suspend.service - NVIDIA system suspend actions.
Nov 26 19:19:36 host445 systemd[1]: Starting systemd-suspend.service - System Suspend…
Nov 26 19:19:36 host445 systemd-sleep[7815]: Entering sleep state ‘suspend’…
Nov 26 19:19:36 host445 kernel: PM: suspend entry (deep)
– Boot d1e8bdf15a31481aa4f936395519ea3c –