Frequent crashes with 525.89.02 on OpenSUSE when exiting SteamVR

I am seeing frequent crashes when I exit SteamVR with an HTC Vive setup. It’s crashing in the kernel and completely locks X11 forcing me to hit the reset button. This is with a RTX3060 running OpenSUSE 15.4 with the latest KDE installed.
Sometimes things crash while running NeosVR, but most of the time it is when I close SteamVR afterward. I did not have this issue with the previous release.

My setup has two 4K monitors connected to the two DisplayPort ports, a 1920x1200 monitor hooked up to one of the HDMI ports, and the HTC Vive connected to the other HDMI port.

Here is the bug report (unfortunately from when my system is working) and the entire dmesg output from the crash case:
nvidia-bug-report.log.gz (670.2 KB)

Once in a while, I am able to ssh in and obtain information and here is what I’m seeing from dmesg:

Since I am unable to provide more than one attachment here is the relevant section of the dmesg kernel output:
[ 3427.936078] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [irq/338-nvidia:2959]
[ 3427.936090] Modules linked in: rpcsec_gss_krb5 rfcomm tcp_diag inet_diag vhost_net vhost vhost_iotlb tap tun unix_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter overlay xfrm_user xfrm_algo twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc md4 cp210x usbserial ccm af_packet iscsi_ibft iscsi_boot_sysfs uinput snd_seq dmi_sysfs lm75(N) regmap_i2c cmac algif_hash algif_skcipher af_alg bnep uvcvideo binfmt_misc intel_rapl_msr pwc(N) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev joydev snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device
[ 3427.936142] mc btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(POEX) nvidia_modeset(POEX) sch_fq_codel nvidia_uvm(POEX) nls_iso8859_1 nls_cp437 vfat fat intel_rapl_common amd64_edac edac_mce_amd nvidia(POEX) ext4 crc16 mbcache drm_kms_helper snd_hda_codec_realtek ath10k_pci jbd2 snd_hda_codec_generic kvm_amd cec snd_hda_codec_hdmi ledtrig_audio ath10k_core rc_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi syscopyarea ath eeepc_wmi(N) kvm sysfillrect bcache asus_wmi sysimgblt battery sparse_keymap platform_profile irqbypass pcspkr snd_hda_codec video crc64 efi_pstore(N) wmi_bmof fb_sys_fops snd_hda_core mac80211 snd_hwdep snd_pcm snd_timer atlantic igb snd k10temp i2c_piix4 soundcore wil6210 i2c_algo_bit macsec dca libarc4 gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables xfs libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic uas
[ 3427.936220] sr_mod wl(POEN) usb_storage usbhid cdrom sd_mod crc32_pclmul crc32c_intel ghash_clmulni_intel cfg80211 aesni_intel xhci_pci crypto_simd ahci cryptd xhci_pci_renesas mxm_wmi(N) xhci_hcd libahci nvme arcmsr libata nvme_core ccp usbcore sp5100_tco(N) nvme_common t10_pi rfkill wmi button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod br_netfilter bridge stp llc msr ecryptfs efivarfs
[ 3427.936276] Supported: No, Proprietary and Unsupported modules are loaded
[ 3427.936280] CPU: 1 PID: 2959 Comm: irq/338-nvidia Kdump: loaded Tainted: P OE X N 5.14.21-150400.24.46-default #1 SLE15-SP4 98cc77d94566d5eead15db9029d52a2ca42a9eb7
[ 3427.936287] Hardware name: ASUS System Product Name/ROG ZENITH EXTREME, BIOS 2201 07/15/2021
[ 3427.936290] RIP: 0010:_nv024066rm+0xf/0x40 [nvidia]
[ 3427.936541] Code: c0 0f 45 da 41 83 c4 01 41 83 fc 20 75 dd 89 d8 5b 41 5c 41 5d c3 0f 1f 44 00 00 53 8b 9f 9c 09 00 00 83 fb 1f 77 0c 89 df 5b 8c f7 ff ff 0f 1f 40 00 be 00 00 47 06 bf f1 b2 3b 0e 31 c0 e8
[ 3427.936547] RSP: 0018:ffffbcc243357d10 EFLAGS: 00000297
[ 3427.936551] RAX: 0000000000000087 RBX: ffff9c70ea838008 RCX: 0000000000000000
[ 3427.936554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3427.936557] RBP: ffff9c70ea84dcb0 R08: 0000000000000001 R09: 0000000000000000
[ 3427.936560] R10: ffffffffc2488360 R11: ffffffffffffffff R12: 0000000000000000
[ 3427.936563] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000087
[ 3427.936565] FS: 0000000000000000(0000) GS:ffff9c8f71040000(0000) knlGS:0000000000000000
[ 3427.936569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3427.936571] CR2: 0000213c580a0fe0 CR3: 00000002db782000 CR4: 00000000003506e0
[ 3427.936574] Call Trace:
[ 3427.936578]
[ 3427.936581] ? _nv024094rm+0x9/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.936797] ? _nv024040rm+0x2d/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.937014] ? _nv039241rm+0x108/0x190 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.937408] ? _nv021303rm+0x63/0xb0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.937790] ? _nv021358rm+0x2d5/0x730 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.938170] ? _nv028421rm+0x5e/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.938576] ? _nv011149rm+0x1a1/0x310 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.938981] ? _nv028431rm+0x147/0x1b0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.939387] ? _nv000696rm+0x10b/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.939629] ? irq_forced_thread_fn+0x80/0x80
[ 3427.939636] ? rm_isr_bh+0x1c/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.939877] ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3427.940061] ? irq_thread_fn+0x21/0x60
[ 3427.940066] ? irq_thread+0xee/0x1a0
[ 3427.940069] ? wake_threads_waitq+0x30/0x30
[ 3427.940073] ? irq_thread_check_affinity+0xe0/0xe0
[ 3427.940077] ? kthread+0x156/0x180
[ 3427.940081] ? set_kthread_struct+0x50/0x50
[ 3427.940085] ? ret_from_fork+0x22/0x30
[ 3427.940090]
[ 3455.936256] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [irq/338-nvidia:2959]
[ 3455.936268] Modules linked in: rpcsec_gss_krb5 rfcomm tcp_diag inet_diag vhost_net vhost vhost_iotlb tap tun unix_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter overlay xfrm_user xfrm_algo twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc md4 cp210x usbserial ccm af_packet iscsi_ibft iscsi_boot_sysfs uinput snd_seq dmi_sysfs lm75(N) regmap_i2c cmac algif_hash algif_skcipher af_alg bnep uvcvideo binfmt_misc intel_rapl_msr pwc(N) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev joydev snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device
[ 3455.936319] mc btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(POEX) nvidia_modeset(POEX) sch_fq_codel nvidia_uvm(POEX) nls_iso8859_1 nls_cp437 vfat fat intel_rapl_common amd64_edac edac_mce_amd nvidia(POEX) ext4 crc16 mbcache drm_kms_helper snd_hda_codec_realtek ath10k_pci jbd2 snd_hda_codec_generic kvm_amd cec snd_hda_codec_hdmi ledtrig_audio ath10k_core rc_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi syscopyarea ath eeepc_wmi(N) kvm sysfillrect bcache asus_wmi sysimgblt battery sparse_keymap platform_profile irqbypass pcspkr snd_hda_codec video crc64 efi_pstore(N) wmi_bmof fb_sys_fops snd_hda_core mac80211 snd_hwdep snd_pcm snd_timer atlantic igb snd k10temp i2c_piix4 soundcore wil6210 i2c_algo_bit macsec dca libarc4 gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables xfs libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic uas
[ 3455.936395] sr_mod wl(POEN) usb_storage usbhid cdrom sd_mod crc32_pclmul crc32c_intel ghash_clmulni_intel cfg80211 aesni_intel xhci_pci crypto_simd ahci cryptd xhci_pci_renesas mxm_wmi(N) xhci_hcd libahci nvme arcmsr libata nvme_core ccp usbcore sp5100_tco(N) nvme_common t10_pi rfkill wmi button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod br_netfilter bridge stp llc msr ecryptfs efivarfs
[ 3455.936452] Supported: No, Proprietary and Unsupported modules are loaded
[ 3455.936456] CPU: 1 PID: 2959 Comm: irq/338-nvidia Kdump: loaded Tainted: P OEL X N 5.14.21-150400.24.46-default #1 SLE15-SP4 98cc77d94566d5eead15db9029d52a2ca42a9eb7
[ 3455.936463] Hardware name: ASUS System Product Name/ROG ZENITH EXTREME, BIOS 2201 07/15/2021
[ 3455.936467] RIP: 0010:_nv036276rm+0x37/0x70 [nvidia]
[ 3455.936750] Code: d3 48 8d 55 0f 89 de c6 45 0f 00 e8 f3 ae 64 ff 80 7d 0f 00 41 89 c4 75 11 41 39 5d 10 76 1c 49 8b 45 00 c1 eb 02 44 8b 24 98 <5b> 44 89 e0 41 5c 41 5d 48 83 c5 10 c3 0f 1f 40 00 be 00 00 0d 07
[ 3455.936755] RSP: 0018:ffffbcc243357d38 EFLAGS: 00000212
[ 3455.936759] RAX: ffffbcc256000000 RBX: 0000000000300040 RCX: 0000000000c00100
[ 3455.936762] RDX: ffff9c70ea84dcdf RSI: 0000000000c00100 RDI: ffff9c70ea838008
[ 3455.936765] RBP: ffff9c70ea84dcd0 R08: 0000000000000020 R09: ffff9c70ea84dcf8
[ 3455.936768] R10: ffff9c70ea838008 R11: ffffffffffffffff R12: 00000000000012f0
[ 3455.936771] R13: ffff9c70ea838b20 R14: ffff9c70ea838b20 R15: 0000000000000000
[ 3455.936773] FS: 0000000000000000(0000) GS:ffff9c8f71040000(0000) knlGS:0000000000000000
[ 3455.936777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3455.936780] CR2: 0000213c580a0fe0 CR3: 00000002db782000 CR4: 00000000003506e0
[ 3455.936783] Call Trace:
[ 3455.936786]
[ 3455.936790] ? _nv012681rm+0x16e/0x1a0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.937187] ? _nv039236rm+0x6a/0x90 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.937581] ? _nv023520rm+0x360/0x380 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.937974] ? _nv021358rm+0x13d/0x730 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.938355] ? _nv028421rm+0x5e/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.938761] ? _nv011149rm+0x1a1/0x310 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.939166] ? _nv028431rm+0x147/0x1b0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.939570] ? _nv000696rm+0x10b/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.939812] ? irq_forced_thread_fn+0x80/0x80
[ 3455.939819] ? rm_isr_bh+0x1c/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.940060] ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3455.940243] ? irq_thread_fn+0x21/0x60
[ 3455.940247] ? irq_thread+0xee/0x1a0
[ 3455.940251] ? wake_threads_waitq+0x30/0x30
[ 3455.940254] ? irq_thread_check_affinity+0xe0/0xe0
[ 3455.940258] ? kthread+0x156/0x180
[ 3455.940263] ? set_kthread_struct+0x50/0x50
[ 3455.940266] ? ret_from_fork+0x22/0x30
[ 3455.940272]
[ 3483.936429] watchdog: BUG: soft lockup - CPU#1 stuck for 78s! [irq/338-nvidia:2959]
[ 3483.936440] Modules linked in: rpcsec_gss_krb5 rfcomm tcp_diag inet_diag vhost_net vhost vhost_iotlb tap tun unix_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter overlay xfrm_user xfrm_algo twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc md4 cp210x usbserial ccm af_packet iscsi_ibft iscsi_boot_sysfs uinput snd_seq dmi_sysfs lm75(N) regmap_i2c cmac algif_hash algif_skcipher af_alg bnep uvcvideo binfmt_misc intel_rapl_msr pwc(N) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev joydev snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device
[ 3483.936492] mc btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(POEX) nvidia_modeset(POEX) sch_fq_codel nvidia_uvm(POEX) nls_iso8859_1 nls_cp437 vfat fat intel_rapl_common amd64_edac edac_mce_amd nvidia(POEX) ext4 crc16 mbcache drm_kms_helper snd_hda_codec_realtek ath10k_pci jbd2 snd_hda_codec_generic kvm_amd cec snd_hda_codec_hdmi ledtrig_audio ath10k_core rc_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi syscopyarea ath eeepc_wmi(N) kvm sysfillrect bcache asus_wmi sysimgblt battery sparse_keymap platform_profile irqbypass pcspkr snd_hda_codec video crc64 efi_pstore(N) wmi_bmof fb_sys_fops snd_hda_core mac80211 snd_hwdep snd_pcm snd_timer atlantic igb snd k10temp i2c_piix4 soundcore wil6210 i2c_algo_bit macsec dca libarc4 gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables xfs libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic uas
[ 3483.936568] sr_mod wl(POEN) usb_storage usbhid cdrom sd_mod crc32_pclmul crc32c_intel ghash_clmulni_intel cfg80211 aesni_intel xhci_pci crypto_simd ahci cryptd xhci_pci_renesas mxm_wmi(N) xhci_hcd libahci nvme arcmsr libata nvme_core ccp usbcore sp5100_tco(N) nvme_common t10_pi rfkill wmi button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod br_netfilter bridge stp llc msr ecryptfs efivarfs
[ 3483.936625] Supported: No, Proprietary and Unsupported modules are loaded
[ 3483.936628] CPU: 1 PID: 2959 Comm: irq/338-nvidia Kdump: loaded Tainted: P OEL X N 5.14.21-150400.24.46-default #1 SLE15-SP4 98cc77d94566d5eead15db9029d52a2ca42a9eb7
[ 3483.936636] Hardware name: ASUS System Product Name/ROG ZENITH EXTREME, BIOS 2201 07/15/2021
[ 3483.936639] RIP: 0010:_nv030345rm+0x2/0x230 [nvidia]
[ 3483.937034] Code: 48 8b 7d 20 48 be 00 00 00 00 20 00 00 00 ba 01 00 00 00 e8 70 b7 07 00 48 8b 4d 20 be 3f 00 00 00 e9 29 ff ff ff 66 90 41 57 <41> 56 49 89 f7 41 55 41 54 41 89 ce 53 48 83 ed 20 83 fa 07 48 89
[ 3483.937039] RSP: 0018:ffffbcc243357d78 EFLAGS: 00000246
[ 3483.937043] RAX: ffff9c70bdf38808 RBX: ffff9c70ea838008 RCX: 0000000000000000
[ 3483.937047] RDX: 0000000000000003 RSI: ffff9c70bdf38808 RDI: ffff9c70ea838008
[ 3483.937050] RBP: ffff9c70ea84dcf0 R08: 000000000000000b R09: ffff9c70ea84dcfc
[ 3483.937053] R10: ffffffffc2488360 R11: ffffffffffffffff R12: 0000000000000000
[ 3483.937056] R13: ffff9c70bdf47008 R14: ffff9c70ea83bac8 R15: 0000000000000003
[ 3483.937058] FS: 0000000000000000(0000) GS:ffff9c8f71040000(0000) knlGS:0000000000000000
[ 3483.937061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3483.937064] CR2: 0000213c580a0fe0 CR3: 00000002db782000 CR4: 00000000003506e0
[ 3483.937067] Call Trace:
[ 3483.937071]
[ 3483.937074] ? _nv021303rm+0x3a/0xb0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.937457] ? _nv021358rm+0x2d5/0x730 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.937838] ? _nv028421rm+0x5e/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.938243] ? _nv011149rm+0x1a1/0x310 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.938647] ? _nv028431rm+0x147/0x1b0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.939052] ? _nv000696rm+0x10b/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.939294] ? irq_forced_thread_fn+0x80/0x80
[ 3483.939300] ? rm_isr_bh+0x1c/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.939542] ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3483.939725] ? irq_thread_fn+0x21/0x60
[ 3483.939730] ? irq_thread+0xee/0x1a0
[ 3483.939733] ? wake_threads_waitq+0x30/0x30
[ 3483.939737] ? irq_thread_check_affinity+0xe0/0xe0
[ 3483.939741] ? kthread+0x156/0x180
[ 3483.939745] ? set_kthread_struct+0x50/0x50
[ 3483.939749] ? ret_from_fork+0x22/0x30
[ 3483.939754]
[ 3511.936598] watchdog: BUG: soft lockup - CPU#1 stuck for 104s! [irq/338-nvidia:2959]
[ 3511.936610] Modules linked in: rpcsec_gss_krb5 rfcomm tcp_diag inet_diag vhost_net vhost vhost_iotlb tap tun unix_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter overlay xfrm_user xfrm_algo twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc md4 cp210x usbserial ccm af_packet iscsi_ibft iscsi_boot_sysfs uinput snd_seq dmi_sysfs lm75(N) regmap_i2c cmac algif_hash algif_skcipher af_alg bnep uvcvideo binfmt_misc intel_rapl_msr pwc(N) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev joydev snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device
[ 3511.936658] mc btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(POEX) nvidia_modeset(POEX) sch_fq_codel nvidia_uvm(POEX) nls_iso8859_1 nls_cp437 vfat fat intel_rapl_common amd64_edac edac_mce_amd nvidia(POEX) ext4 crc16 mbcache drm_kms_helper snd_hda_codec_realtek ath10k_pci jbd2 snd_hda_codec_generic kvm_amd cec snd_hda_codec_hdmi ledtrig_audio ath10k_core rc_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi syscopyarea ath eeepc_wmi(N) kvm sysfillrect bcache asus_wmi sysimgblt battery sparse_keymap platform_profile irqbypass pcspkr snd_hda_codec video crc64 efi_pstore(N) wmi_bmof fb_sys_fops snd_hda_core mac80211 snd_hwdep snd_pcm snd_timer atlantic igb snd k10temp i2c_piix4 soundcore wil6210 i2c_algo_bit macsec dca libarc4 gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables xfs libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic uas
[ 3511.936730] sr_mod wl(POEN) usb_storage usbhid cdrom sd_mod crc32_pclmul crc32c_intel ghash_clmulni_intel cfg80211 aesni_intel xhci_pci crypto_simd ahci cryptd xhci_pci_renesas mxm_wmi(N) xhci_hcd libahci nvme arcmsr libata nvme_core ccp usbcore sp5100_tco(N) nvme_common t10_pi rfkill wmi button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod br_netfilter bridge stp llc msr ecryptfs efivarfs
[ 3511.936784] Supported: No, Proprietary and Unsupported modules are loaded
[ 3511.936787] CPU: 1 PID: 2959 Comm: irq/338-nvidia Kdump: loaded Tainted: P OEL X N 5.14.21-150400.24.46-default #1 SLE15-SP4 98cc77d94566d5eead15db9029d52a2ca42a9eb7
[ 3511.936794] Hardware name: ASUS System Product Name/ROG ZENITH EXTREME, BIOS 2201 07/15/2021
[ 3511.936797] RIP: 0010:_nv023874rm+0x0/0x10 [nvidia]
[ 3511.937099] Code: 66 2e 0f 1f 84 00 00 00 00 00 b8 56 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <31> c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 56 00 00 00 c3
[ 3511.937105] RSP: 0018:ffffbcc243357ce0 EFLAGS: 00000246
[ 3511.937109] RAX: ffffffffc29c7c90 RBX: ffff9c70ea838008 RCX: 0000000000000000
[ 3511.937112] RDX: 0000000000000000 RSI: 0000000000c00180 RDI: ffff9c70ea838008
[ 3511.937115] RBP: ffff9c70ea84dc90 R08: 0000000000c00180 R09: 0000000000000001
[ 3511.937118] R10: ffff9c70ea838008 R11: ffffffffffffffff R12: 0000000000000000
[ 3511.937120] R13: 0000000000000000 R14: 0000000000c00180 R15: ffff9c70ea838b20
[ 3511.937123] FS: 0000000000000000(0000) GS:ffff9c8f71040000(0000) knlGS:0000000000000000
[ 3511.937126] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3511.937128] CR2: 0000213c580a0fe0 CR3: 00000002db782000 CR4: 00000000003506e0
[ 3511.937131] Call Trace:
[ 3511.937135]
[ 3511.937138] ? _nv023873rm+0x72/0x120 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.937516] ? _nv012682rm+0x4a/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.937892] ? _nv039242rm+0x65/0x70 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.938267] ? _nv039241rm+0xb3/0x190 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.938643] ? _nv021303rm+0x63/0xb0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.939006] ? _nv021358rm+0x2d5/0x730 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.939368] ? _nv028421rm+0x5e/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.939751] ? _nv011149rm+0x1a1/0x310 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.940136] ? _nv028431rm+0x147/0x1b0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.940521] ? _nv000696rm+0x10b/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.940763] ? irq_forced_thread_fn+0x80/0x80
[ 3511.940769] ? rm_isr_bh+0x1c/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.941011] ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3511.941224] ? irq_thread_fn+0x21/0x60
[ 3511.941229] ? irq_thread+0xee/0x1a0
[ 3511.941232] ? wake_threads_waitq+0x30/0x30
[ 3511.941236] ? irq_thread_check_affinity+0xe0/0xe0
[ 3511.941239] ? kthread+0x156/0x180
[ 3511.941243] ? set_kthread_struct+0x50/0x50
[ 3511.941247] ? ret_from_fork+0x22/0x30
[ 3511.941252]
[ 3539.936763] watchdog: BUG: soft lockup - CPU#1 stuck for 130s! [irq/338-nvidia:2959]
[ 3539.936776] Modules linked in: rpcsec_gss_krb5 rfcomm tcp_diag inet_diag vhost_net vhost vhost_iotlb tap tun unix_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter overlay xfrm_user xfrm_algo twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic libdes camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xcbc md4 cp210x usbserial ccm af_packet iscsi_ibft iscsi_boot_sysfs uinput snd_seq dmi_sysfs lm75(N) regmap_i2c cmac algif_hash algif_skcipher af_alg bnep uvcvideo binfmt_misc intel_rapl_msr pwc(N) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev joydev snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device
[ 3539.936829] mc btusb btrtl btbcm btintel bluetooth ecdh_generic nvidia_drm(POEX) nvidia_modeset(POEX) sch_fq_codel nvidia_uvm(POEX) nls_iso8859_1 nls_cp437 vfat fat intel_rapl_common amd64_edac edac_mce_amd nvidia(POEX) ext4 crc16 mbcache drm_kms_helper snd_hda_codec_realtek ath10k_pci jbd2 snd_hda_codec_generic kvm_amd cec snd_hda_codec_hdmi ledtrig_audio ath10k_core rc_core snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi syscopyarea ath eeepc_wmi(N) kvm sysfillrect bcache asus_wmi sysimgblt battery sparse_keymap platform_profile irqbypass pcspkr snd_hda_codec video crc64 efi_pstore(N) wmi_bmof fb_sys_fops snd_hda_core mac80211 snd_hwdep snd_pcm snd_timer atlantic igb snd k10temp i2c_piix4 soundcore wil6210 i2c_algo_bit macsec dca libarc4 gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables xfs libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic uas
[ 3539.936905] sr_mod wl(POEN) usb_storage usbhid cdrom sd_mod crc32_pclmul crc32c_intel ghash_clmulni_intel cfg80211 aesni_intel xhci_pci crypto_simd ahci cryptd xhci_pci_renesas mxm_wmi(N) xhci_hcd libahci nvme arcmsr libata nvme_core ccp usbcore sp5100_tco(N) nvme_common t10_pi rfkill wmi button l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod br_netfilter bridge stp llc msr ecryptfs efivarfs
[ 3539.936963] Supported: No, Proprietary and Unsupported modules are loaded
[ 3539.936967] CPU: 1 PID: 2959 Comm: irq/338-nvidia Kdump: loaded Tainted: P OEL X N 5.14.21-150400.24.46-default #1 SLE15-SP4 98cc77d94566d5eead15db9029d52a2ca42a9eb7
[ 3539.936974] Hardware name: ASUS System Product Name/ROG ZENITH EXTREME, BIOS 2201 07/15/2021
[ 3539.936978] RIP: 0010:_nv021364rm+0x0/0xa0 [nvidia]
[ 3539.937375] Code: 4d 89 65 60 0f 84 3d ff ff ff be 00 00 b6 09 bf 57 9e 96 0e 31 c0 e8 9f 47 c2 ff bf b6 09 00 00 e8 f5 cf 42 00 e9 1d ff ff ff <48> 83 ec 08 48 85 d2 74 47 0f b7 12 8d 42 a6 66 83 f8 3f 77 1b 48
[ 3539.937381] RSP: 0018:ffffbcc243357dd0 EFLAGS: 00000202
[ 3539.937385] RAX: ffffffffc2491c00 RBX: ffff9c70bdf47198 RCX: 0000000000000000
[ 3539.937388] RDX: ffff9c70ea84dd4e RSI: ffff9c70bdf47008 RDI: ffff9c70ea838008
[ 3539.937391] RBP: ffff9c70ea84dd40 R08: ffffffffc513e0b0 R09: 0000000000000020
[ 3539.937394] R10: ffff9c70ea838008 R11: ffffffffffffffff R12: ffff9c70ea838008
[ 3539.937396] R13: ffff9c70ea84dd73 R14: 0000000000000029 R15: ffff9c70ea838008
[ 3539.937399] FS: 0000000000000000(0000) GS:ffff9c8f71040000(0000) knlGS:0000000000000000
[ 3539.937403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3539.937406] CR2: 0000213c580a0fe0 CR3: 00000002db782000 CR4: 00000000003506e0
[ 3539.937409] Call Trace:
[ 3539.937412]
[ 3539.937416] ? _nv028421rm+0x5e/0xc0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.937825] ? _nv011149rm+0x1a1/0x310 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.938232] ? _nv028431rm+0x147/0x1b0 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.938638] ? _nv000696rm+0x10b/0x140 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.938883] ? irq_forced_thread_fn+0x80/0x80
[ 3539.938890] ? rm_isr_bh+0x1c/0x60 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.939133] ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia e79ac18622c7f49b9560afba132ea3b0b223dd26]
[ 3539.939318] ? irq_thread_fn+0x21/0x60
[ 3539.939322] ? irq_thread+0xee/0x1a0
[ 3539.939326] ? wake_threads_waitq+0x30/0x30
[ 3539.939330] ? irq_thread_check_affinity+0xe0/0xe0
[ 3539.939333] ? kthread+0x156/0x180
[ 3539.939338] ? set_kthread_struct+0x50/0x50
[ 3539.939342] ? ret_from_fork+0x22/0x30
[ 3539.939347]

One thing I have seen is a stuck IRQ where the Nvidia driver IRQ is consuming 100% of the CPU. I will try and capture this the next time it happens. The problem is that when this lockup occurs I usually have no access to the system. There is no way to access a console locally and usually, I cannot remotely log in to get a capture.