Dual RTX3090 cards become undetected when running applications with few hours

We are running Pytorch programs on Ubuntu 20.04 with Dual RTX3090.
But the graphic cards may become undetected after a few hours.
The following message. What is wrong?
Assistance appreciated.

nvidia-smi
Unable to determine the device handle for GPU 0000:17:00.0: Unknown Error

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal

lspci | grep -v nvi
00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 07)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 07)
00:05.2 System peripheral: Intel Corporation Sky Lake-E RAS (rev 07)
00:05.4 PIC: Intel Corporation Sky Lake-E IOAPIC (rev 07)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 07)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 07)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 07)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:14.2 Signal processing controller: Intel Corporation 200 Series PCH Thermal Subsystem
00:15.0 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #0
00:15.1 Signal processing controller: Intel Corporation 200 Series PCH Serial IO I2C Controller #1
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 RAID bus controller: Intel Corporation SATA Controller [RAID mode]
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.2 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1f.0 ISA bridge: Intel Corporation X299 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
02:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
03:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
16:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07)
16:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 07)
16:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 07)
16:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 07)
16:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:09.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0a.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0a.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:0f.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:10.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:10.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 07)
16:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
16:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
17:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev ff)
17:00.1 Audio device: NVIDIA Corporation Device 1aef (rev ff)
64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07)
64:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 07)
64:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 07)
64:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 07)
64:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0a.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 07)
64:0a.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 07)
64:0a.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 07)
64:0b.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 07)
64:0b.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 07)
64:0b.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 07)
64:0b.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 07)
64:0c.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0c.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0c.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0c.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0c.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
64:0c.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 07)
64:0c.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 07)
64:0c.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 07)
64:0d.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 07)
64:0d.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 07)
64:0d.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 07)
64:0d.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 07)
65:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
65:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
b2:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D (rev 07)
b2:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 07)
b2:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 07)
b2:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 07)
b2:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 07)
b2:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 07)
b2:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 07)
b2:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 07)
b2:15.1 Performance counters: Intel Corporation Sky Lake-E DDRIO Registers (rev 07)
b2:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 07)
b2:16.1 Performance counters: Intel Corporation Sky Lake-E DDRIO Registers (rev 07)
b2:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 07)
b2:16.5 Performance counters: Intel Corporation Sky Lake-E DDRIO Registers (rev 07)
b2:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 07)
b2:17.1 Performance counters: Intel Corporation Sky Lake-E DDRIO Registers (rev 07)
b3:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983

nvidia-bug-report.sh
nvidia-bug-report.log.gz (1.2 MB)