GPU is crashing after resume from sleep

Sometimes when I resume my PC from sleep, screen goes to sleep on TTY7. When I switch back to TTY1, the console becomes visible. Back to TTY7, the screen goes to sleep again.

In the logs I can see the following related lines:

jan 26 07:27:32 openSUSE kded5[30629]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE ksmserver[30671]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE korgac[30705]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE kdeinit5[30615]: kdeinit5: Fatal IO error: client killed
jan 26 07:27:32 openSUSE kdeinit5[30615]: kdeinit5: sending SIGHUP to children.
jan 26 07:27:32 openSUSE xembedsniproxy[30679]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE kactivitymanagerd[30639]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE kdeinit5[30615]: kdeinit5: sending SIGTERM to children.
jan 26 07:27:32 openSUSE kdeinit5[30615]: kdeinit5: Exit.
jan 26 07:27:32 openSUSE klauncher[30616]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE yakuake[30746]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE polkit-kde-authentication-agent-1[30681]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE kglobalaccel5[30653]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE kaccess[30683]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE gmenudbusmenuproxy[30702]: The X11 connection broke (error 1). Did the X11 server die?
jan 26 07:27:32 openSUSE pulseaudio[30704]: X connection to :0 broken (explicit kill or server shutdown).
[...]

In dmesg I can see a lot of graphical errors around sleep/resume:

NVRM: Xid (PCI:0000:2d:00): 31, pid=30946, Ch 0000001b, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_058e0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x50, 0x3b0)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x3b00050 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x70, 0x3b0)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x3b00070 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00001b0c, Data 1000f010
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=0, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=1405403387, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=1405403387, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=1405403387, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=1405403387, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 0: 3D HEIGHT CT Violation. Coordinates: (0x10, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x500420=0x80000020 0x500434=0x600010 0x500438=0x2a00 0x50043c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception on GPC 1: 3D HEIGHT CT Violation. Coordinates: (0x0, 0x60)
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ESR 0x508420=0x80000020 0x508434=0x600000 0x508438=0x2a00 0x50843c=0x0
NVRM: Xid (PCI:0000:2d:00): 13, pid=30946, Graphics Exception: ChID 001b, Class 0000b197, Offset 00000f80, Data 00000000

OS and kernel: openSUSE Tumbleweed with x64 kernel 5.10.4.

nvidia-bug-report.log.gz (773.8 KB)

Can you please try enabling suspend/resume video memory preservation? It’s described in the README here: Chapter 21. Configuring Power Management Support

I know there are a lot of steps involved to get that wired up, and I’m working on improvements for future releases to make the installer set up the appropriate systemd units automatically, but for now please follow those steps and see if the problem goes away.

Thank you for the tip!

I configured systemd with Save allocations in an unnamed temporary file method. I tried suspend/resume twice and it worked nice. Third time, this morning, the screen didn’t come back from sleep but this time neither on TTY1. The difference between the third and the first two attempts was that during the third time I turned on my monitor after I started my PC while during the first two times I left my monitor turned on.

As I see suspending via systemd works:

jan 28 00:30:22 openSUSE systemd[1]: Reached target Sleep.
jan 28 00:30:22 openSUSE systemd[1]: Starting NVIDIA system suspend actions...
jan 28 00:30:22 openSUSE suspend[8845]: nvidia-suspend.service
jan 28 00:30:22 openSUSE logger[8845]: <13>Jan 28 00:30:22 suspend: nvidia-suspend.service
jan 28 00:30:23 openSUSE acpid[711]: client 1309[0:0] has disconnected
jan 28 00:30:23 openSUSE root[8853]: Turning off secondary displays
jan 28 00:30:23 openSUSE root[8857]: Turning off secondary displays
jan 28 00:30:24 openSUSE systemd[1]: nvidia-suspend.service: Succeeded.
jan 28 00:30:24 openSUSE systemd[1]: Finished NVIDIA system suspend actions.
jan 28 00:30:24 openSUSE systemd[1]: Starting Suspend...
jan 28 00:30:24 openSUSE systemd-sleep[8865]: INFO: Skip running /usr/lib/systemd/system-sleep/grub2.sleep for suspend
jan 28 00:30:24 openSUSE systemd-sleep[8863]: Suspending system...

and

jan 28 00:30:10 openSUSE systemd[1]: systemd-suspend.service: Succeeded.
jan 28 00:30:10 openSUSE systemd[1]: Finished Suspend.
jan 28 00:30:10 openSUSE systemd[1]: Stopped target Sleep.
jan 28 00:30:10 openSUSE systemd[1]: Reached target Suspend.
jan 28 00:30:10 openSUSE systemd-logind[777]: Operation 'sleep' finished.
jan 28 00:30:10 openSUSE systemd[1]: Starting NVIDIA system resume actions...
jan 28 00:30:10 openSUSE systemd[1]: Stopped target Suspend.
jan 28 00:30:10 openSUSE suspend[8402]: nvidia-resume.service
jan 28 00:30:10 openSUSE logger[8402]: <13>Jan 28 00:30:10 suspend: nvidia-resume.service
jan 28 00:30:10 openSUSE systemd[1]: nvidia-resume.service: Succeeded.
jan 28 00:30:10 openSUSE systemd[1]: Finished NVIDIA system resume actions.

It is interesting that even though I configured the nvidia module parameter:

cat /etc/modprobe.d/nvidia-power-management.conf
> options nvidia NVreg_PreserveVideoMemoryAllocations=1

I don’t see it applied on a freshly booted system:

ls -l /sys/module/nvidia/parameters/
> -r--r--r-- 1 root root 4096 jan   28 08.33 nv_cap_enable_devfs

During the third resume the kernel reported the following error:

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 10 PID: 8903 at /var/lib/dkms/nvidia/460.32.03/build/nvidia/nv.c:3817 nv_restore_user_channels+0xc9/0xe0 [nvidia]
kernel: Modules linked in: nvidia_uvm(POE) rfcomm af_packet nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw hid_logitech_hidpp ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle cmac algif_hash algif_skcipher iptable_raw nvidia_drm(POE) iwlmvm af_alg nvidia_modeset(POE) iptable_security mac80211 hid_logitech_dj bnep libarc4 btusb btrtl btbcm btintel bluetooth snd_usb_audio ip_set iwlwifi nfnetlink snd_usbmidi_lib r8169 snd_rawmidi realtek uas nvidia(POE) ebtable_filter mdio_devres ecdh_generic snd_seq_device cfg80211 joydev mc usb_storage ecc libphy ebtables ip6table_filter ip6_tables hid_generic rfkill iptable_filter ip_tables x_tables bpfilter usbhid snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
kernel:  ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec edac_mce_amd snd_hda_core nct6775 kvm_amd dmi_sysfs hwmon_vid snd_hwdep ccp drm_kms_helper i2c_dev soundwire_bus kvm snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm cec snd_timer rc_core fb_sys_fops irqbypass wmi_bmof pcspkr efi_pstore snd k10temp syscopyarea sp5100_tco sysfillrect i2c_piix4 sysimgblt soundcore tiny_power_button acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat drm fuse configfs xhci_pci xhci_pci_renesas xhci_hcd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel usbcore glue_helper crypto_simd nvme cryptd nvme_core wmi button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
kernel: CPU: 10 PID: 8903 Comm: nvidia-sleep.sh Tainted: P           OE     5.10.4-1-default #1 openSUSE Tumbleweed
kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.B0 09/07/2020
kernel: RIP: 0010:nv_restore_user_channels+0xc9/0xe0 [nvidia]
kernel: Code: c8 b4 f0 be 01 00 00 00 4c 89 e7 e8 a1 9a 00 00 4c 89 ff e8 49 c7 b4 f0 ba 02 00 00 00 4c 89 e6 48 89 ef e8 29 b1 8b 00 eb 94 <0f> 0b eb c6 41 bd 51 00 00 00 eb 9f 66 66 2e 0f 1f 84 00 00 00 00
kernel: RSP: 0018:ffffb1df451f7e28 EFLAGS: 00010206
kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffffb1df451f7dc8
kernel: RDX: 0000000000000087 RSI: 0000000000000246 RDI: 0000000000000246
kernel: RBP: ffff9f77057a3000 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9f75c31dc000
kernel: R13: 0000000000000003 R14: ffff9f75c31dc4f8 R15: ffff9f75c31dc000
kernel: FS:  00007fc71ee1bb80(0000) GS:ffff9f78dea80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000562f76077038 CR3: 000000020dcd6000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  nv_set_system_power_state+0x222/0x3c0 [nvidia]
kernel:  nv_procfs_write_suspend+0xec/0x140 [nvidia]
kernel:  proc_reg_write+0x51/0x90
kernel:  vfs_write+0xc3/0x270
kernel:  ksys_write+0x5f/0xe0
kernel:  do_syscall_64+0x33/0x80
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7fc71ef3f357
kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff10c1a508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fc71ef3f357
kernel: RDX: 0000000000000007 RSI: 0000562babff27a0 RDI: 0000000000000001
kernel: RBP: 0000562babff27a0 R08: 000000000000000a R09: 0000000000000000
kernel: R10: 0000562bab05661a R11: 0000000000000246 R12: 0000000000000007
kernel: R13: 00007fc71f012520 R14: 0000000000000007 R15: 00007fc71f012720
kernel: ---[ end trace be3b7cbc1ee61206 ]---
kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 10 PID: 8903 at /var/lib/dkms/nvidia/460.32.03/build/nvidia/nv.c:4012 nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
kernel: Modules linked in: nvidia_uvm(POE) rfcomm af_packet nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw hid_logitech_hidpp ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle cmac algif_hash algif_skcipher iptable_raw nvidia_drm(POE) iwlmvm af_alg nvidia_modeset(POE) iptable_security mac80211 hid_logitech_dj bnep libarc4 btusb btrtl btbcm btintel bluetooth snd_usb_audio ip_set iwlwifi nfnetlink snd_usbmidi_lib r8169 snd_rawmidi realtek uas nvidia(POE) ebtable_filter mdio_devres ecdh_generic snd_seq_device cfg80211 joydev mc usb_storage ecc libphy ebtables ip6table_filter ip6_tables hid_generic rfkill iptable_filter ip_tables x_tables bpfilter usbhid snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
kernel:  ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec edac_mce_amd snd_hda_core nct6775 kvm_amd dmi_sysfs hwmon_vid snd_hwdep ccp drm_kms_helper i2c_dev soundwire_bus kvm snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm cec snd_timer rc_core fb_sys_fops irqbypass wmi_bmof pcspkr efi_pstore snd k10temp syscopyarea sp5100_tco sysfillrect i2c_piix4 sysimgblt soundcore tiny_power_button acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat drm fuse configfs xhci_pci xhci_pci_renesas xhci_hcd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel usbcore glue_helper crypto_simd nvme cryptd nvme_core wmi button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
kernel: CPU: 10 PID: 8903 Comm: nvidia-sleep.sh Tainted: P        W  OE     5.10.4-1-default #1 openSUSE Tumbleweed
kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.B0 09/07/2020
kernel: RIP: 0010:nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
kernel: Code: ed 0f 84 4c ff ff ff 41 83 fc 02 74 ea 48 8b 85 60 02 00 00 be 02 00 00 00 48 8b 78 78 e8 48 d4 ff ff 85 c0 74 d1 0f 0b eb cd <0f> 0b e9 63 ff ff ff 48 c7 c7 70 3b e4 c2 e8 1d 99 b4 f0 e8 a8 0f
kernel: RSP: 0018:ffffb1df451f7e58 EFLAGS: 00010206
kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffffd932c9a1b600
kernel: RDX: 0000326660e06290 RSI: ffffffffc0fa4e84 RDI: 0000000000000001
kernel: RBP: ffff9f75c31dc000 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
kernel: R13: 0000562babff27a0 R14: ffffb1df451f7f10 R15: 0000000000000007
kernel: FS:  00007fc71ee1bb80(0000) GS:ffff9f78dea80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000562f76077038 CR3: 000000020dcd6000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  nv_procfs_write_suspend+0xec/0x140 [nvidia]
kernel:  proc_reg_write+0x51/0x90
kernel:  vfs_write+0xc3/0x270
kernel:  ksys_write+0x5f/0xe0
kernel:  do_syscall_64+0x33/0x80
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7fc71ef3f357
kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff10c1a508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fc71ef3f357
kernel: RDX: 0000000000000007 RSI: 0000562babff27a0 RDI: 0000000000000001
kernel: RBP: 0000562babff27a0 R08: 000000000000000a R09: 0000000000000000
kernel: R10: 0000562bab05661a R11: 0000000000000246 R12: 0000000000000007
kernel: R13: 00007fc71f012520 R14: 0000000000000007 R15: 00007fc71f012720
kernel: ---[ end trace be3b7cbc1ee61207 ]---
kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:435

Though it’s not configured in my fstab I have a tmpfs under /tmp mounted:

mount | grep '/tmp'
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=8142924k,nr_inodes=409600,inode64)

I’ve upgraded the driver to 460.39.

Today again there was a kernel warning:

jan 30 09:13:09 kernel: ------------[ cut here ]------------
jan 30 09:13:09 kernel: WARNING: CPU: 4 PID: 22666 at /var/lib/dkms/nvidia/460.39/build/nvidia/nv.c:3826 nv_restore_user_channels+0xc9/0xe0 [nvidia]
jan 30 09:13:09 kernel: Modules linked in: nvidia_uvm(POE) udp_diag tcp_diag inet_diag nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) rfcomm af_packet nf_conntrack_netbios_ns nf_conntrack_broadcast nft_ct nft_chain_nat nf_tables hid_logitech_hidpp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink hid_logitech_dj iwlmvm cmac algif_hash x_tables algif_skcipher bpfilter af_alg bnep i2c_dev mac80211 snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence iwlwifi snd_hda_codec btusb btrtl btbcm btintel snd_hda_core edac_mce_amd bluetooth dmi_sysfs soundwire_bus nct6775 kvm_amd cfg80211 ccp hwmon_vid snd_soc_core snd_usb_audio snd_usbmidi_lib r8169 snd_hwdep snd_rawmidi snd_seq_device snd_compress realtek mc ecdh_generic kvm sp5100_tco snd_pcm_dmaengine mdio_devres efi_pstore pcspkr wmi_bmof joydev k10temp ecc drm_kms_helper libphy rfkill irqbypass i2c_piix4
jan 30 09:13:09 kernel:  snd_pcm cec snd_timer snd rc_core fb_sys_fops syscopyarea sysfillrect soundcore sysimgblt uas usb_storage tiny_power_button acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat drm fuse configfs hid_generic usbhid xhci_pci xhci_pci_renesas xhci_hcd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel usbcore glue_helper crypto_simd cryptd nvme nvme_core wmi button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs [last unloaded: ip_tables]
jan 30 09:13:09 kernel: CPU: 4 PID: 22666 Comm: nvidia-sleep.sh Tainted: P           OE     5.10.4-1-default #1 openSUSE Tumbleweed
jan 30 09:13:09 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.B0 09/07/2020
jan 30 09:13:09 kernel: RIP: 0010:nv_restore_user_channels+0xc9/0xe0 [nvidia]
jan 30 09:13:09 kernel: Code: f8 82 d8 be 01 00 00 00 4c 89 e7 e8 d1 9c 00 00 4c 89 ff e8 49 f7 82 d8 ba 02 00 00 00 4c 89 e6 48 89 ef e8 e9 05 8c 00 eb 94 <0f> 0b eb c6 41 bd 51 00 00 00 eb 9f 66 66 2e 0f 1f 84 00 00 00 00
jan 30 09:13:09 kernel: RSP: 0018:ffffaf15859c7e28 EFLAGS: 00010206
jan 30 09:13:09 kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: ffffaf15859c7dc8
jan 30 09:13:09 kernel: RDX: 0000000000000087 RSI: 0000000000000246 RDI: 0000000000000246
jan 30 09:13:09 kernel: RBP: ffff918ddc0bb000 R08: 0000000000000000 R09: 0000000000000000
jan 30 09:13:09 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff918dd0797800
jan 30 09:13:09 kernel: R13: 0000000000000003 R14: ffff918dd0797cf8 R15: ffff918dd0797800
jan 30 09:13:09 kernel: FS:  00007efd129a3b80(0000) GS:ffff91909eb00000(0000) knlGS:0000000000000000
jan 30 09:13:09 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 30 09:13:09 kernel: CR2: 00000d6beefb4900 CR3: 000000012c0c2000 CR4: 0000000000350ee0
jan 30 09:13:09 kernel: Call Trace:
jan 30 09:13:09 kernel:  nv_set_system_power_state+0x222/0x3c0 [nvidia]
jan 30 09:13:09 kernel:  nv_procfs_write_suspend+0xec/0x140 [nvidia]
jan 30 09:13:09 kernel:  proc_reg_write+0x51/0x90
jan 30 09:13:09 kernel:  vfs_write+0xc3/0x270
jan 30 09:13:09 kernel:  ksys_write+0x5f/0xe0
jan 30 09:13:09 kernel:  do_syscall_64+0x33/0x80
jan 30 09:13:09 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
jan 30 09:13:09 kernel: RIP: 0033:0x7efd12ac7357
jan 30 09:13:09 kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
jan 30 09:13:09 kernel: RSP: 002b:00007ffc4430c578 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
jan 30 09:13:09 kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007efd12ac7357
jan 30 09:13:09 kernel: RDX: 0000000000000007 RSI: 00005585c7b497a0 RDI: 0000000000000001
jan 30 09:13:09 kernel: RBP: 00005585c7b497a0 R08: 000000000000000a R09: 0000000000000000
jan 30 09:13:09 kernel: R10: 00005585c5bad61a R11: 0000000000000246 R12: 0000000000000007
jan 30 09:13:09 kernel: R13: 00007efd12b9a520 R14: 0000000000000007 R15: 00007efd12b9a720
jan 30 09:13:09 kernel: ---[ end trace 7eb7e2b2f8bfb6cc ]---
jan 30 09:13:09 kernel: ------------[ cut here ]------------
jan 30 09:13:09 kernel: WARNING: CPU: 4 PID: 22666 at /var/lib/dkms/nvidia/460.39/build/nvidia/nv.c:4021 nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
jan 30 09:13:09 kernel: Modules linked in: nvidia_uvm(POE) udp_diag tcp_diag inet_diag nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) rfcomm af_packet nf_conntrack_netbios_ns nf_conntrack_broadcast nft_ct nft_chain_nat nf_tables hid_logitech_hidpp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink hid_logitech_dj iwlmvm cmac algif_hash x_tables algif_skcipher bpfilter af_alg bnep i2c_dev mac80211 snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence iwlwifi snd_hda_codec btusb btrtl btbcm btintel snd_hda_core edac_mce_amd bluetooth dmi_sysfs soundwire_bus nct6775 kvm_amd cfg80211 ccp hwmon_vid snd_soc_core snd_usb_audio snd_usbmidi_lib r8169 snd_hwdep snd_rawmidi snd_seq_device snd_compress realtek mc ecdh_generic kvm sp5100_tco snd_pcm_dmaengine mdio_devres efi_pstore pcspkr wmi_bmof joydev k10temp ecc drm_kms_helper libphy rfkill irqbypass i2c_piix4
jan 30 09:13:09 kernel:  snd_pcm cec snd_timer snd rc_core fb_sys_fops syscopyarea sysfillrect soundcore sysimgblt uas usb_storage tiny_power_button acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat drm fuse configfs hid_generic usbhid xhci_pci xhci_pci_renesas xhci_hcd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel usbcore glue_helper crypto_simd cryptd nvme nvme_core wmi button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs [last unloaded: ip_tables]
jan 30 09:13:09 kernel: CPU: 4 PID: 22666 Comm: nvidia-sleep.sh Tainted: P        W  OE     5.10.4-1-default #1 openSUSE Tumbleweed
jan 30 09:13:09 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.B0 09/07/2020
jan 30 09:13:09 kernel: RIP: 0010:nv_set_system_power_state+0x2c0/0x3c0 [nvidia]
jan 30 09:13:09 kernel: Code: ed 0f 84 4c ff ff ff 41 83 fc 02 74 ea 48 8b 85 60 02 00 00 be 02 00 00 00 48 8b 78 78 e8 48 d4 ff ff 85 c0 74 d1 0f 0b eb cd <0f> 0b e9 63 ff ff ff 48 c7 c7 70 dc 36 c3 e8 1d c9 82 d8 e8 d8 11
jan 30 09:13:09 kernel: RSP: 0018:ffffaf15859c7e58 EFLAGS: 00010206
jan 30 09:13:09 kernel: RAX: 0000000000000003 RBX: 0000000000000002 RCX: 0000000080020001
jan 30 09:13:09 kernel: RDX: 0000000080020002 RSI: ffffffffc14c1e84 RDI: ffff918dc017f100
jan 30 09:13:09 kernel: RBP: ffff918dd0797800 R08: 0000000000000001 R09: 0000000000000000
jan 30 09:13:09 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
jan 30 09:13:09 kernel: R13: 00005585c7b497a0 R14: ffffaf15859c7f10 R15: 0000000000000007
jan 30 09:13:09 kernel: FS:  00007efd129a3b80(0000) GS:ffff91909eb00000(0000) knlGS:0000000000000000
jan 30 09:13:09 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jan 30 09:13:09 kernel: CR2: 00000d6beefb4900 CR3: 000000012c0c2000 CR4: 0000000000350ee0
jan 30 09:13:09 kernel: Call Trace:
jan 30 09:13:09 kernel:  nv_procfs_write_suspend+0xec/0x140 [nvidia]
jan 30 09:13:09 kernel:  proc_reg_write+0x51/0x90
jan 30 09:13:09 kernel:  vfs_write+0xc3/0x270
jan 30 09:13:09 kernel:  ksys_write+0x5f/0xe0
jan 30 09:13:09 kernel:  do_syscall_64+0x33/0x80
jan 30 09:13:09 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
jan 30 09:13:09 kernel: RIP: 0033:0x7efd12ac7357
jan 30 09:13:09 kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
jan 30 09:13:09 kernel: RSP: 002b:00007ffc4430c578 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
jan 30 09:13:09 kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007efd12ac7357
jan 30 09:13:09 kernel: RDX: 0000000000000007 RSI: 00005585c7b497a0 RDI: 0000000000000001
jan 30 09:13:09 kernel: RBP: 00005585c7b497a0 R08: 000000000000000a R09: 0000000000000000
jan 30 09:13:09 kernel: R10: 00005585c5bad61a R11: 0000000000000246 R12: 0000000000000007
jan 30 09:13:09 kernel: R13: 00007efd12b9a520 R14: 0000000000000007 R15: 00007efd12b9a720
jan 30 09:13:09 kernel: ---[ end trace 7eb7e2b2f8bfb6cd ]---
jan 30 09:13:12 kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
jan 30 09:13:14 kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:435

Additionally, the following lines are in the journal:

jan 30 09:16:00 systemd[1]: systemd-suspend.service: State 'stop-sigterm' timed out. Killing.
jan 30 09:16:00 systemd[1]: systemd-suspend.service: Killing process 22666 (nvidia-sleep.sh) with signal SIGKILL.
[...]
jan 30 09:17:31 systemd[1]: systemd-suspend.service: State 'final-sigterm' timed out. Killing.
jan 30 09:17:31 systemd[1]: systemd-suspend.service: Killing process 22666 (nvidia-sleep.sh) with signal SIGKILL.
jan 30 09:17:31 systemd[1]: systemd-suspend.service: Failed with result 'timeout'.
jan 30 09:17:31 systemd[1]: systemd-suspend.service: Unit process 22666 (nvidia-sleep.sh) remains running after unit stopped.
jan 30 09:17:31 systemd[1]: Failed to start Suspend.
jan 30 09:17:31 systemd[1]: Dependency failed for Suspend.
jan 30 09:17:31 systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'.
jan 30 09:17:31 systemd-logind[765]: Operation 'sleep' finished.
jan 30 09:17:31 systemd[1]: Stopped target Sleep.
jan 30 09:17:31 systemd[1]: Starting NVIDIA system resume actions...

It seems it’s just a symptom that on TTY1 the screen doesn’t wake up anymore. The cause is that USB hubs are not powered and my keyboard doesn’t react. Even if I plug a different keyboard that is left uninitialised.

Please, tell me if I can help anything with further debugging. Until then I’ll turn off the systemd suspending because it seems it doesn’t fix the problem.