570 Random Freeze: GPU has fallen off the bus

I’m facing a random crash which spams “Error setting GPU limit” to logs.

Sound is still running in the background but nothing else works.

Running Fedora 41 with kernel 6.13.8, NVIDIA 570.133.07, on an optimus laptop with a 3070Ti Mobile and intel processor, one screen connected to GPU via DP and another via HDMI (I believe this goes through the iGPU).

Right now running kernel 6.11.4 which seems to have minimized the issue but it’s still happening. Never encountered this error before with 565.

These are what I believe are the relevant log entries:

10:04:08 flatpak-system-: system: Pulled runtime/org.freedesktop.Platform.GL32.nvidia-570-133-07/x86_64/1.4 from /var/lib/flatpak/repo/tmp/flatpak-cache-U49F42/repo-F9LdCh
10:04:08 nvidia-powerd: Error setting GPU limit: 142963.
10:04:03 systemd: var-tmp-flatpak\x2dcache\x2dDED132-org.freedesktop.Platform.GL32.nvidia\x2d570\x2d133\x2d07\x2d0HPG42.mount: Deactivated successfully.
10:04:03 gnome-software: /var/tmp/flatpak-cache-DED132/org.freedesktop.Platform.GL32.nvidia-570-133-07-0HPG42/repo-F9LdCh: Pulled runtime/org.freedesktop.Platform.GL32.nvidia-570-133-07/x86_64/1.4 from flathub
10:04:03 nvidia-powerd: Error setting GPU limit: 142570.
10:03:59 kernel: [drm:__nv_drm_semsurf_wait_fence_work_cb [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register auto-value-update on pre-wait value for sync FD semaphore surface
10:03:59 kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
10:03:59 gnome-shell: Failed to set hardware cursor (Failed to allocate gbm_bo: Invalid argument), using OpenGL from now on
10:03:58 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 kernel: NVRM: Error in service of callback 
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:56 nvidia-powerd: Error setting GPU limit: 150000.
10:03:56 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 150000.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 150000.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 150000.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 149658.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Failed to get topology status f
10:03:55 nvidia-powerd: Error setting GPU limit: 149197.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 148585.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 147791.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 146977.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 146144.
10:03:55 nvidia-powerd: error setting power limit
10:03:55 nvidia-powerd: Error setting GPU limit: 145618.
10:03:55 nvidia-powerd: error setting power limit
10:03:54 kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
10:03:54 kernel:  </TASK>
10:03:54 kernel: R13: 00007f04015fca7c R14: 0000000067eaaeda R15: 00007f04015fc8b0
10:03:54 kernel: R10: 00007f04015fdab0 R11: 0000000000000246 R12: 000000000000001e
10:03:54 kernel: RBP: 00007f04015fc8a0 R08: 00007f04015fca60 R09: 00007f04015fca7c
10:03:54 kernel: RDX: 00007f04015fca60 RSI: 00000000c020462a RDI: 000000000000001e
10:03:54 kernel: RAX: ffffffffffffffda RBX: 00007f04015fca60 RCX: 00007f04157e730d
10:03:54 kernel: RSP: 002b:00007f04015fc850 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
10:03:54 kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
10:03:54 kernel: RIP: 0033:0x7f04157e730d
10:03:54 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
10:03:54 kernel: Call Trace:
10:03:54 kernel: Hardware name: System76 Oryx Pro/Oryx Pro, BIOS 2024-07-08_926f73d 06/28/2024
10:03:54 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
10:03:54 kernel: CPU: 4 UID: 1000 PID: 4533 Comm: [vkps] Update Tainted: P           OE      6.11.4-301.fc41.x86_64 #1
10:03:54 kernel: NVRM:     -7    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba848fa1 0x000631a4ba848fa6      5us  
10:03:54 kernel: NVRM:     -7    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba848fa1 0x000631a4ba848fa6      5us  
10:03:54 kernel: NVRM:     -6    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba84c129 0x000631a4ba84c12e      5us  
10:03:54 kernel: NVRM:     -5    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba84f121 0x000631a4ba84f126      5us  
10:03:54 kernel: NVRM:     -4    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba852222 0x000631a4ba852226      4us  
10:03:54 kernel: NVRM:     -3    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba85538f 0x000631a4ba855394      5us  
10:03:54 kernel: NVRM:     -2    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba858470 0x000631a4ba858474      4us  
10:03:54 kernel: NVRM:     -1    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba85b59e 0x000631a4ba85b5a2      4us  
10:03:54 kernel: NVRM:      0    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000631a4ba85e553 0x000631a4ba85e558      5us  
10:03:54 kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
10:03:54 kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
10:03:54 kernel: NVRM:     -7    76   GSP_RM_CONTROL        0x000000002080e634 0x0000000000000188 0x000631a4ba5d8394 0x000631a4ba5d9d47   6579us  
10:03:54 kernel: NVRM:     -6    76   GSP_RM_CONTROL        0x000000002080a0d1 0x00000000000007e8 0x000631a4ba621e18 0x000631a4ba6220d4    700us  
10:03:54 kernel: NVRM:     -5    76   GSP_RM_CONTROL        0x000000002080a0c5 0x0000000000000510 0x000631a4ba627039 0x000631a4ba627502   1225us  
10:03:54 kernel: NVRM:     -4    76   GSP_RM_CONTROL        0x000000002080e634 0x0000000000000188 0x000631a4ba6cef3c 0x000631a4ba6d0abe   7042us  
10:03:54 kernel: NVRM:     -3    76   GSP_RM_CONTROL        0x000000002080a0d1 0x00000000000007e8 0x000631a4ba75373f 0x000631a4ba753afa    955us  
10:03:54 kernel: NVRM:     -2    76   GSP_RM_CONTROL        0x000000002080852f 0x0000000000000308 0x000631a4ba774951 0x000631a4ba774d2c    987us  
10:03:54 kernel: NVRM:     -1    76   GSP_RM_CONTROL        0x000000002080e634 0x0000000000000188 0x000631a4ba77c312 0x000631a4ba77e81a   9480us  
10:03:54 kernel: NVRM:      0    76   GSP_RM_CONTROL        0x000000002080a0d1 0x00000000000007e8 0x000631a4ba884fe0 0x0000000000000000          y
10:03:54 kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
10:03:54 kernel: NVRM: GPU0 RPC history (CPU -> GSP):
10:03:54 kernel: NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000.
10:03:54 kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
10:03:54 kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
10:03:54 kernel: NVRM: GPU at PCI:0000:01:00: GPU-fa813934-9d9f-3126-0ccb-4d01c14cd133
10:03:53 cat: [336:698:0331/100353.217891:ERROR:srtp_transport.cc(156)] Failed to unprotect RTCP packet: size=48, type=205```

Still experiencing this with kernel 6.13.9

Looks like it all begins with “GPU has fallen off the bus”

00:31:00 kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
00:31:00 kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
00:31:00 kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
00:31:00 kernel: NVRM: GPU at PCI:0000:01:00: GPU-fa813934-9d9f-3126-0ccb-4d01c14cd133

I’ve seen a couple other users reporting similar failures, could be related?

Reverted to 560.35.03 and the issue is completely gone, definitely something weird going on with 570.

Quite frustrating though, as whenever the system was usable the performance (and smoothness) improvements were very noticeable.

I’ve since upgraded to fedora 42 (GNOME 48), kernel 6.14 and retried 570 drivers.

The issue seems to happen whenever I have an external monitor plugged in. I’ve tried different configurations (HDMI to iGPU, DP to iGPU, DP to dGPU), the only thing that seems to help is lowering the refresh rate of one of the external monitors. 240hz and 120hz both make the system freeze, 60hz does too, but noticeably less so.

Without any external monitor I’ve used the system for extended periods of time without any issue.

I’m attaching nvidia-bug-report

nvidia-bug-report.log (1.7 MB)

I am also seeing the same issue with an external monitor. System freezes and I can’t do much even when switching ttys and back. I need restart/power cycle my laptop.

Running Linux 6.14.5-2-cachyos
CPU: AMD Ryzen 7 5800H (16) @ 4.46 GHz
GPU 1: NVIDIA GeForce RTX 3060 Mobile / Max-Q [Discrete]
NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8

Disabling GSP seems to have done the trick for me, I’m now facing a different issue when powering on the system with the external monitors plugged in.

I’m now forced to unplug everything before turning on, waiting to log-in, plugging everything back, log-out and the log back in for everything to work properly, but at least the system is usable and no further freezes seem to be happening.

Seen elsewhere that maybe setting the primary monitor as the laptop screen (and not changing it) has worked for some, worth a try as well but right now I need stability so won’t be testing enabling GSP back on and changing primary screens until I have some downtime to tinker.

thank you, @tantalus1641! I’ll give this shot and report back.

I’m also seeing this using the package from the graphics-drivers Ubuntu PPA. It is much worse with an external screen attached, but the failure still happens eventually using just a laptop display.

My GPU is an RTX 4070 mobile and I have a QHD laptop display and 4K external screen. With the external screen in use, the system never survives more than a few seconds after the auto-login. With only the laptop display, it sometimes lasts a few minutes. User interaction seems to be a problem for it; if I leave it alone after auto-login, it doesn’t crash.

This in dmesg seems to be relevant:

[   56.690243] NVRM: GPU at PCI:0000:01:00: GPU-57164be6-eed0-8252-bb72-2e95a2e0dbd4
[   56.690248] NVRM: Xid (PCI:0000:01:00): 79, pid=4272, name=gnome-shell, GPU has fallen off the bus.
[   56.690253] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[   56.690561] NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required)
[   61.791521] NVRM: Error in service of callback 
[   61.801593] ------------[ cut here ]------------
[   61.801595] WARNING: CPU: 10 PID: 7177 at nvidia/nv.c:4946 nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.801802] Modules linked in: nf_conntrack_netlink xt_nat veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf xfrm_user xfrm_algo xt_addrtype snd_seq_dummy snd_hrtimer ipmi_devintf ipmi_msghandler ccm ipt_REJECT nf_reject_ipv4 xt_conntrack xt_MASQUERADE nft_chain_nat xt_CHECKSUM xt_comment xt_tcpudp nft_compat iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay qrtr rfcomm cmac algif_hash algif_skcipher af_alg bnep msr binfmt_misc nls_iso8859_1 nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_hda_codec_realtek snd_soc_acpi soundwire_bus snd_hda_codec_generic snd_hda_scodec_component snd_soc_sdca snd_hda_codec_hdmi
[   61.801826]  iwlmvm snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi mac80211 snd_hda_codec snd_hda_core snd_hwdep libarc4 intel_uncore_frequency snd_hda_scodec_tas2781_i2c uvcvideo intel_uncore_frequency_common snd_soc_tas2781_fmwlib snd_seq_midi snd_seq_midi_event intel_tcc_cooling videobuf2_vmalloc uvc snd_soc_tas2781_comlib videobuf2_memops videobuf2_v4l2 snd_rawmidi snd_soc_core x86_pkg_temp_thermal btusb videobuf2_common iwlwifi intel_powerclamp snd_compress cmdlinepart btrtl snd_seq ac97_bus videodev processor_thermal_device_pci btintel snd_pcm_dmaengine spi_nor processor_thermal_device processor_thermal_wt_hint btbcm sch_fq_codel coretemp btmtk spd5118 mtd mei_hdcp mei_pxp intel_rapl_msr snd_pcm mc snd_seq_device processor_thermal_rfim rapl drm_ttm_helper processor_thermal_rapl intel_pmc_core intel_rapl_common cfg80211 bluetooth snd_timer intel_cstate wmi_bmof i2c_i801 ttm snd spi_intel_pci processor_thermal_wt_req pmt_telemetry i2c_smbus spi_intel ideapad_laptop
[   61.801853]  processor_thermal_power_floor i2c_mux pmt_class soundcore sparse_keymap processor_thermal_mbox int3403_thermal crc8 platform_profile int340x_thermal_zone kvm_intel int3400_thermal acpi_tad intel_vsec acpi_thermal_rel acpi_pad mei_me joydev input_leds mei mac_hid kvm iptable_filter ip6table_filter ip6_tables br_netfilter nfsd bridge stp llc arp_tables auth_rpcgss parport_pc nfs_acl ppdev lockd grace lp nvme_fabrics parport nvme_keyring efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt usbhid hid_multitouch hid_generic polyval_clmulni nvme polyval_generic ghash_clmulni_intel sha256_ssse3 r8169 video sha1_ssse3 ucsi_acpi nvme_core i2c_hid_acpi intel_lpss_pci typec_ucsi i2c_hid serio_raw intel_lpss realtek nvme_auth idma64 typec hid wmi pinctrl_alderlake aesni_intel crypto_simd cryptd
[   61.801881] CPU: 10 UID: 1000 PID: 7177 Comm: chrome Tainted: P           OE      6.14.0-15-generic #15-Ubuntu
[   61.801883] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   61.801883] Hardware name: LENOVO 82WR/LNVNB161216, BIOS KWCN42WW 09/15/2023
[   61.801884] RIP: 0010:nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.802003] Code: 31 d2 31 f6 31 ff c3 cc cc cc cc 48 c7 c7 10 de a6 c2 e8 b0 b3 5a cf 5b 41 5c 41 5d 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc <0f> 0b eb c2 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[   61.802004] RSP: 0018:ffffaa048b7c7c38 EFLAGS: 00010202
[   61.802005] RAX: 0000000000000026 RBX: ffff9149896b8000 RCX: 0000000000000000
[   61.802006] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffaa048b7c7b68
[   61.802007] RBP: ffffaa048b7c7c50 R08: 0000000000000000 R09: 0000000000000000
[   61.802007] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9149896b86a8
[   61.802007] R13: ffff914a82e60000 R14: ffff91499224a720 R15: ffffffffc2a6df60
[   61.802008] FS:  0000000000000000(0000) GS:ffff9150df300000(0000) knlGS:0000000000000000
[   61.802009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.802010] CR2: 000060e21ed40b38 CR3: 00000003361f0001 CR4: 0000000000f72ef0
[   61.802010] PKRU: 55555558
[   61.802011] Call Trace:
[   61.802012]  <TASK>
[   61.802014]  ? show_trace_log_lvl+0x1be/0x310
[   61.802016]  ? show_trace_log_lvl+0x1be/0x310
[   61.802017]  ? nvidia_close+0x1a2/0x290 [nvidia]
[   61.802134]  ? show_regs.part.0+0x22/0x30
[   61.802135]  ? show_regs.cold+0x8/0x10
[   61.802136]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.802241]  ? __warn.cold+0xac/0x10c
[   61.802243]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.802349]  ? report_bug+0x114/0x160
[   61.802351]  ? handle_bug+0x6e/0xb0
[   61.802353]  ? exc_invalid_op+0x18/0x80
[   61.802354]  ? asm_exc_invalid_op+0x1b/0x20
[   61.802357]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.802462]  ? nvidia_dev_put+0x75/0xc0 [nvidia]
[   61.802567]  nvidia_close+0x1a2/0x290 [nvidia]
[   61.802688]  __fput+0xea/0x2d0
[   61.802690]  ____fput+0x15/0x20
[   61.802692]  task_work_run+0x5d/0xa0
[   61.802694]  do_exit+0x26e/0x4c0
[   61.802695]  do_group_exit+0x34/0x90
[   61.802697]  __x64_sys_exit_group+0x18/0x20
[   61.802698]  x64_sys_call+0x141e/0x2310
[   61.802700]  do_syscall_64+0x7e/0x170
[   61.802701]  ? arch_exit_to_user_mode_prepare.isra.0+0xc8/0xd0
[   61.802703]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.802704]  ? do_syscall_64+0x8a/0x170
[   61.802705]  ? __fput+0x1a2/0x2d0
[   61.802707]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[   61.802708]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.802709]  ? do_syscall_64+0x8a/0x170
[   61.802710]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[   61.802711]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.802712]  ? do_syscall_64+0x8a/0x170
[   61.802713]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[   61.802714]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   61.802716] RIP: 0033:0x72ca050f668d
[   61.802717] Code: Unable to access opcode bytes at 0x72ca050f6663.
[   61.802717] RSP: 002b:00007ffc5ceffc18 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[   61.802718] RAX: ffffffffffffffda RBX: 00007ffc5ceffc28 RCX: 000072ca050f668d
[   61.802719] RDX: 00000000000000e7 RSI: ffffffffffffe778 RDI: 0000000000000001
[   61.802719] RBP: 00007ffc5ceffcb0 R08: 0000000000000000 R09: 0000000000000000
[   61.802720] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000001bc1
[   61.802720] R13: 0000114400c50a40 R14: 00007ffc5ceffc40 R15: 0000114400c509c0
[   61.802721]  </TASK>
[   61.802722] ---[ end trace 0000000000000000 ]---
[   61.803131] ------------[ cut here ]------------
[   61.803132] WARNING: CPU: 10 PID: 7177 at nvidia/nv.c:4946 nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.803271] Modules linked in: nf_conntrack_netlink xt_nat veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf xfrm_user xfrm_algo xt_addrtype snd_seq_dummy snd_hrtimer ipmi_devintf ipmi_msghandler ccm ipt_REJECT nf_reject_ipv4 xt_conntrack xt_MASQUERADE nft_chain_nat xt_CHECKSUM xt_comment xt_tcpudp nft_compat iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay qrtr rfcomm cmac algif_hash algif_skcipher af_alg bnep msr binfmt_misc nls_iso8859_1 nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_hda_codec_realtek snd_soc_acpi soundwire_bus snd_hda_codec_generic snd_hda_scodec_component snd_soc_sdca snd_hda_codec_hdmi
[   61.803294]  iwlmvm snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi mac80211 snd_hda_codec snd_hda_core snd_hwdep libarc4 intel_uncore_frequency snd_hda_scodec_tas2781_i2c uvcvideo intel_uncore_frequency_common snd_soc_tas2781_fmwlib snd_seq_midi snd_seq_midi_event intel_tcc_cooling videobuf2_vmalloc uvc snd_soc_tas2781_comlib videobuf2_memops videobuf2_v4l2 snd_rawmidi snd_soc_core x86_pkg_temp_thermal btusb videobuf2_common iwlwifi intel_powerclamp snd_compress cmdlinepart btrtl snd_seq ac97_bus videodev processor_thermal_device_pci btintel snd_pcm_dmaengine spi_nor processor_thermal_device processor_thermal_wt_hint btbcm sch_fq_codel coretemp btmtk spd5118 mtd mei_hdcp mei_pxp intel_rapl_msr snd_pcm mc snd_seq_device processor_thermal_rfim rapl drm_ttm_helper processor_thermal_rapl intel_pmc_core intel_rapl_common cfg80211 bluetooth snd_timer intel_cstate wmi_bmof i2c_i801 ttm snd spi_intel_pci processor_thermal_wt_req pmt_telemetry i2c_smbus spi_intel ideapad_laptop
[   61.803318]  processor_thermal_power_floor i2c_mux pmt_class soundcore sparse_keymap processor_thermal_mbox int3403_thermal crc8 platform_profile int340x_thermal_zone kvm_intel int3400_thermal acpi_tad intel_vsec acpi_thermal_rel acpi_pad mei_me joydev input_leds mei mac_hid kvm iptable_filter ip6table_filter ip6_tables br_netfilter nfsd bridge stp llc arp_tables auth_rpcgss parport_pc nfs_acl ppdev lockd grace lp nvme_fabrics parport nvme_keyring efi_pstore sunrpc nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt usbhid hid_multitouch hid_generic polyval_clmulni nvme polyval_generic ghash_clmulni_intel sha256_ssse3 r8169 video sha1_ssse3 ucsi_acpi nvme_core i2c_hid_acpi intel_lpss_pci typec_ucsi i2c_hid serio_raw intel_lpss realtek nvme_auth idma64 typec hid wmi pinctrl_alderlake aesni_intel crypto_simd cryptd
[   61.803343] CPU: 10 UID: 1000 PID: 7177 Comm: chrome Tainted: P        W  OE      6.14.0-15-generic #15-Ubuntu
[   61.803344] Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   61.803345] Hardware name: LENOVO 82WR/LNVNB161216, BIOS KWCN42WW 09/15/2023
[   61.803345] RIP: 0010:nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.803460] Code: 31 d2 31 f6 31 ff c3 cc cc cc cc 48 c7 c7 10 de a6 c2 e8 b0 b3 5a cf 5b 41 5c 41 5d 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc <0f> 0b eb c2 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[   61.803461] RSP: 0018:ffffaa048b7c7c38 EFLAGS: 00010202
[   61.803462] RAX: 0000000000000026 RBX: ffff9149896b8000 RCX: 0000000000000000
[   61.803462] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffaa048b7c7b68
[   61.803463] RBP: ffffaa048b7c7c50 R08: 0000000000000000 R09: 0000000000000000
[   61.803463] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9149896b86a8
[   61.803464] R13: ffff914b1ab80000 R14: ffff91499224a720 R15: ffffffffc2a6df60
[   61.803465] FS:  0000000000000000(0000) GS:ffff9150df300000(0000) knlGS:0000000000000000
[   61.803465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   61.803466] CR2: 000060e21ed40b38 CR3: 00000003361f0001 CR4: 0000000000f72ef0
[   61.803466] PKRU: 55555558
[   61.803467] Call Trace:
[   61.803467]  <TASK>
[   61.803468]  ? show_trace_log_lvl+0x1be/0x310
[   61.803470]  ? show_trace_log_lvl+0x1be/0x310
[   61.803471]  ? nvidia_close+0x1a2/0x290 [nvidia]
[   61.803584]  ? show_regs.part.0+0x22/0x30
[   61.803585]  ? show_regs.cold+0x8/0x10
[   61.803587]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.803715]  ? __warn.cold+0xac/0x10c
[   61.803717]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.803848]  ? report_bug+0x114/0x160
[   61.803850]  ? handle_bug+0x6e/0xb0
[   61.803851]  ? exc_invalid_op+0x18/0x80
[   61.803853]  ? asm_exc_invalid_op+0x1b/0x20
[   61.803855]  ? nvidia_dev_put+0xb3/0xc0 [nvidia]
[   61.803970]  ? nvidia_dev_put+0x75/0xc0 [nvidia]
[   61.804083]  nvidia_close+0x1a2/0x290 [nvidia]
[   61.804192]  __fput+0xea/0x2d0
[   61.804193]  ____fput+0x15/0x20
[   61.804194]  task_work_run+0x5d/0xa0
[   61.804196]  do_exit+0x26e/0x4c0
[   61.804199]  do_group_exit+0x34/0x90
[   61.804200]  __x64_sys_exit_group+0x18/0x20
[   61.804202]  x64_sys_call+0x141e/0x2310
[   61.804203]  do_syscall_64+0x7e/0x170
[   61.804204]  ? arch_exit_to_user_mode_prepare.isra.0+0xc8/0xd0
[   61.804206]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.804207]  ? do_syscall_64+0x8a/0x170
[   61.804208]  ? __fput+0x1a2/0x2d0
[   61.804209]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[   61.804210]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.804212]  ? do_syscall_64+0x8a/0x170
[   61.804213]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[   61.804214]  ? syscall_exit_to_user_mode+0x38/0x1d0
[   61.804215]  ? do_syscall_64+0x8a/0x170
[   61.804216]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[   61.804217]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   61.804218] RIP: 0033:0x72ca050f668d
[   61.804219] Code: Unable to access opcode bytes at 0x72ca050f6663.
[   61.804219] RSP: 002b:00007ffc5ceffc18 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[   61.804220] RAX: ffffffffffffffda RBX: 00007ffc5ceffc28 RCX: 000072ca050f668d
[   61.804221] RDX: 00000000000000e7 RSI: ffffffffffffe778 RDI: 0000000000000001
[   61.804221] RBP: 00007ffc5ceffcb0 R08: 0000000000000000 R09: 0000000000000000
[   61.804222] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000001bc1
[   61.804222] R13: 0000114400c50a40 R14: 00007ffc5ceffc40 R15: 0000114400c509c0
[   61.804223]  </TASK>
[   61.804224] ---[ end trace 0000000000000000 ]---

There are lots of these, I think perhaps once per CPU core.

I am having the same issue with a Legion Pro 7i with the RTX4090 Mobile.

Reverting to 560 fixes the issue, with more than a week without a crash running 24/7.

Using 570 with an external monitor make the sytem crash after a few minutes after login. Both on Xorg and Wayland. After the crash it is not possible to switch TTY