Redhat 8.8 with 10 4090 cards, the _nv041444rm exception during the restart process

pcp-pmie[6473]: Some CPU busy executing in system mode 98%sys[cpu140]@localhost.localdomain
kernel: watchdog: BUG: soft lockup - CPU#140 stuck for 22s! [Xorg:5726]
kernel: Modules linked in: xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set uio_pci_generic uio vfio_pci vfio_virqfd vfio_iommu_type1 vfio cuse nf_tables nfnetlink rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc vfat fat intel_rapl_msr intel_rapl_common intel_sdsi pmt_crashlog pmt_telemetry pmt_class i10nm_edac nfit iTCO_wdt iTCO_vendor_support libnvdimm snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp snd_hda_intel kvm_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm irqbypass snd_hwdep crct10dif_pclmul crc32_pclmul snd_seq ghash_clmulni_intel snd_seq_device ipmi_ssif rapl intel_cstate snd_pcm snd_timer ses snd enclosure intel_uncore isst_if_mmio
kernel: isst_if_mbox_pci pcspkr idxd scsi_transport_sas isst_if_common soundcore intel_vsec idxd_bus joydev i2c_i801 cdc_ether mei_me usbnet mii mei i2c_ismt acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter knem(OE) xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvidia_drm(POE) nvidia_modeset(POE) sr_mod cdrom sd_mod sg mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) uas usb_storage ast(OE) nvidia(POE) drm_vram_helper mlx5_core(OE) crc32c_intel drm_kms_helper mlxfw(OE) psample pci_hyperv_intf tls drm_ttm_helper megaraid_sas syscopyarea ttm sysfillrect sysimgblt fb_sys_fops mlxdevm(OE) nvme mlx_compat(OE) drm ahci nvme_core libahci igb(OE) dca libata i2c_algo_bit t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod xpmem(OE) fuse
kernel: CPU: 140 PID: 5726 Comm: Xorg Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.10.1.el8_8.x86_64 #1
kernel: RIP: 0010:_nv041444rm+0x55/0xa0 [nvidia]
kernel: Code: c0 49 d3 e0 31 f6 83 e0 fb 48 c1 e7 03 44 8d 48 07 31 c0 66 90 48 8b 54 f3 10 48 8b 0c 3a 4c 21 c1 48 89 ca 44 89 d1 48 d3 ea <89> f1 48 83 c6 01 48 d3 e2 48 09 d0 49 39 f1 75 da 5b 48 83 c5 10
kernel: RSP: 0018:ff48b8fe4ff83900 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
kernel: RAX: 0000000000000000 RBX: ff14a8aadabbe988 RCX: 000000000000001d
kernel: RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000018cd1860
kernel: RBP: ff14a8ab364f2e10 R08: 0000000020000000 R09: 0000000000000007
kernel: R10: 000000000000001d R11: 00000009fb21c500 R12: 0000000ba7bfa4c5
kernel: R13: ff14a8aadabbe988 R14: ff14a8aa98971938 R15: 0000000000000000
kernel: FS: 00007faaf96e7b40(0000) GS:ff14a8e97fb00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000555c46404578 CR3: 000000018e296006 CR4: 0000000000771ee0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: ? _nv041432rm+0x58/0xb0 [nvidia]
kernel: ? _nv041435rm+0x293/0x370 [nvidia]
kernel: ? _nv035451rm+0x102/0x490 [nvidia]
kernel: ? _nv035280rm+0x252/0x2e0 [nvidia]
kernel: ? _nv035503rm+0x361/0x700 [nvidia]
kernel: ? _nv025696rm+0x8e/0x1e0 [nvidia]
kernel: ? _nv025944rm+0x48/0x60 [nvidia]
kernel: ? _nv000706rm+0xac5/0x1f10 [nvidia]
kernel: ? rm_init_adapter+0xcd/0xf0 [nvidia]
kernel: ? nv_start_device+0x195/0x8b0 [nvidia]
kernel: ? nv_open_device+0x91/0x190 [nvidia]
kernel: ? nvidia_open+0x215/0x5f0 [nvidia]
kernel: ? kobj_lookup+0xf1/0x160
kernel: ? nvidia_frontend_open+0x53/0xa0 [nvidia]
kernel: ? chrdev_open+0xcb/0x1e0
kernel: ? cdev_default_release+0x20/0x20
kernel: ? do_dentry_open+0x133/0x350
kernel: ? path_openat+0x55b/0x1580
kernel: ? _cond_resched+0x15/0x30
kernel: ? _cond_resched+0x15/0x30
kernel: ? do_filp_open+0x93/0x100
kernel: ? getname_flags+0x4a/0x1e0
kernel: ? __check_object_size+0xac/0x173
kernel: ? __alloc_fd+0x44/0x150
kernel: ? do_sys_openat2+0x211/0x2b0
kernel: ? do_sys_open+0x4b/0x80
kernel: ? do_syscall_64+0x5b/0x1b0
kernel: ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
abrt-dump-journal-oops[4739]: abrt-dump-journal-oops: Found oopses: 1
abrt-dump-journal-oops[4739]: abrt-dump-journal-oops: Creating problem directories