GPU sporadically disconnecting

This is recurrent problem and it happens during high load conditions as well as low load conditions. To recover the GPU I need to reboot the system. OS: Kubuntu 18.04, 2 GPUS: RTX 2070, CUDA 10.2.

I have a logger service that runs $ nvidia-smi dmon -s mu -d 5 -o TD on the background. Here is the log:

Mar 11 23:21:49 pipemon-desktop monitor_gpus[18877]: #Date Time gpu fb bar1 sm mem enc dec
Mar 11 23:21:49 pipemon-desktop monitor_gpus[18877]: #YYYYMMDD HH:MM:SS Idx MB MB % % % %
Mar 11 23:21:49 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:49 0 7553 15 20 0 0 0
Mar 11 23:21:49 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:49 1 12 4 0 0 0 0
Mar 11 23:21:54 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:54 0 7553 15 22 1 0 0
Mar 11 23:21:54 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:54 1 12 4 0 0 0 0
Mar 11 23:21:59 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:59 0 7553 15 22 7 0 0
Mar 11 23:21:59 pipemon-desktop monitor_gpus[18877]: 20200311 23:21:59 1 12 4 0 0 0 0
Mar 11 23:22:04 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:04 0 7553 15 23 1 0 0
Mar 11 23:22:04 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:04 1 12 4 0 0 0 0
Mar 11 23:22:09 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:09 0 7553 15 24 2 0 0
Mar 11 23:22:09 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:09 1 12 4 0 0 0 0
Mar 11 23:22:14 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:14 0 7553 15 11 1 0 0
Mar 11 23:22:14 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:14 1 12 4 0 0 0 0
Mar 11 23:22:19 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:19 0 7553 15 24 2 0 0
Mar 11 23:22:19 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:19 1 12 4 0 0 0 0
Mar 11 23:22:24 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:24 0 7553 15 19 1 0 0
Mar 11 23:22:24 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:24 1 12 4 0 0 0 0
Mar 11 23:22:29 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:29 0 7553 15 20 2 0 0
Mar 11 23:22:29 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:29 1 12 4 0 0 0 0
Mar 11 23:22:34 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:34 0 7553 15 20 0 0 0
Mar 11 23:22:34 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:34 1 12 4 0 0 0 0
Mar 11 23:22:39 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:39 0 7553 15 26 6 0 0
Mar 11 23:22:39 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:39 1 12 4 0 0 0 0
Mar 11 23:22:44 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:44 0 7553 15 22 1 0 0
Mar 11 23:22:44 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:44 1 12 4 0 0 0 0
Mar 11 23:22:49 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:49 0 7553 15 21 1 0 0
Mar 11 23:22:49 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:49 1 12 4 0 0 0 0
Mar 11 23:22:54 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:54 0 7553 15 12 1 0 0
Mar 11 23:22:54 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:54 1 12 4 0 0 0 0
Mar 11 23:22:59 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:59 0 7553 15 24 2 0 0
Mar 11 23:22:59 pipemon-desktop monitor_gpus[18877]: 20200311 23:22:59 1 12 4 0 0 0 0
Mar 11 23:23:04 pipemon-desktop monitor_gpus[18877]: 20200311 23:23:04 0 7553 15 19 1 0 0
Mar 11 23:23:04 pipemon-desktop monitor_gpus[18877]: 20200311 23:23:04 1 12 4 0 0 0 0

And the journalctl high priority error logs:
Mar 11 23:23:08 pipemon-desktop kernel: NVRM: Xid (PCI:0000:07:00): 74, pid=1394, NVLink: fatal error detected on link 0(0x10000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Mar 11 23:23:08 pipemon-desktop kernel: NVRM: Xid (PCI:0000:08:00): 79, pid=1394, GPU has fallen off the bus.
Mar 11 23:23:09 pipemon-desktop kernel: NVRM: Xid (PCI:0000:07:00): 38, pid=1394, 0020 0000c597 00000100 00000000
Mar 11 23:23:09 pipemon-desktop kernel: xhci_hcd 0000:08:00.2: PCI post-resume error -19!
Mar 11 23:23:09 pipemon-desktop kernel: xhci_hcd 0000:08:00.2: HC died; cleaning up
Mar 11 23:23:09 pipemon-desktop kernel: NVRM: Xid (PCI:0000:07:00): 61, pid=3246, 0cec(3098) 00000000 00000000
Mar 11 23:23:15 pipemon-desktop kernel: nvidia-gpu 0000:08:00.3: i2c timeout error ffffffff
Mar 11 23:23:17 pipemon-desktop kernel: nvidia-gpu 0000:08:00.3: i2c timeout error ffffffff
Mar 11 23:23:17 pipemon-desktop kernel: nvidia-gpu 0000:08:00.3: i2c stop failed -110
Mar 11 23:23:17 pipemon-desktop kernel: ucsi_ccg 1-0008: i2c_transfer failed -110
Mar 11 23:23:41 pipemon-desktop kernel: NVRM: Xid (PCI:0000:07:00): 38, pid=3246, 0008 0000902d 00000000 00000000
Mar 11 23:26:34 pipemon-desktop kernel: INFO: task Xorg:1625 blocked for more than 120 seconds.
Mar 11 23:26:34 pipemon-desktop kernel: Tainted: P OE 5.3.0-40-generic #32~18.04.1-Ubuntu
Mar 11 23:26:34 pipemon-desktop kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 11 23:28:35 pipemon-desktop kernel: INFO: task Xorg:1625 blocked for more than 241 seconds.
Mar 11 23:28:35 pipemon-desktop kernel: Tainted: P OE 5.3.0-40-generic #32~18.04.1-Ubuntu
Mar 11 23:28:35 pipemon-desktop kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 11 23:28:35 pipemon-desktop kernel: INFO: task nvidia-smi:18898 blocked for more than 120 seconds.
Mar 11 23:28:35 pipemon-desktop kernel: Tainted: P OE 5.3.0-40-generic #32~18.04.1-Ubuntu
Mar 11 23:28:35 pipemon-desktop kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 11 23:28:35 pipemon-desktop kernel: INFO: task python:29552 blocked for more than 120 seconds.
Mar 11 23:28:35 pipemon-desktop kernel: Tainted: P OE 5.3.0-40-generic #32~18.04.1-Ubuntu