Something goes wrong with PCIe and Ubuntu freezes only mouse can move but cannot click several times a day on dgx station v100

Thanks for your help.
Now the log info turned out to be like follow:

Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989180] INFO: task gnome-shell:2722 blocked for more than 120 seconds.
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989186] Tainted: P OEL 4.15.0-65-generic #74-Ubuntu
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989188] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989191] gnome-shell D 0 2722 2456 0x00000000
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989195] Call Trace:
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989203] __schedule+0x24e/0x880
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989208] ? ttwu_do_wakeup+0x1e/0x140
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989211] schedule+0x2c/0x80
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989214] rwsem_down_write_failed+0x1ea/0x360
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989220] ? __wake_up_common+0x73/0x130
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989224] call_rwsem_down_write_failed+0x17/0x30
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989227] ? call_rwsem_down_write_failed+0x17/0x30
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989231] down_write+0x2d/0x40
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989457] os_acquire_rwlock_write+0x3b/0x50 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.989795] _nv038381rm+0xc/0x30 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.990051] ? _nv039329rm+0x18d/0x1d0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.990297] ? _nv041056rm+0x45/0xd0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.990633] ? _nv041001rm+0x142/0x2b0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.990883] ? _nv039291rm+0x15a/0x2e0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.991128] ? _nv039292rm+0x5b/0x90 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.991370] ? _nv039292rm+0x31/0x90 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.991616] ? _nv012677rm+0x1d/0x30 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.991862] ? _nv039307rm+0xb0/0xb0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.992117] ? _nv012699rm+0x54/0x70 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.992389] ? _nv011412rm+0xc4/0x120 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.992644] ? _nv000657rm+0x63/0x70 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.992890] ? _nv000580rm+0x2c/0x40 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.993245] ? _nv000694rm+0x86c/0xc80 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.993616] ? rm_ioctl+0x54/0xb0 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.993805] ? nvidia_ioctl+0x2dc/0x840 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.993998] ? nvidia_frontend_unlocked_ioctl+0x42/0x50 [nvidia]
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.994007] ? do_vfs_ioctl+0xa8/0x630
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.994014] ? __sys_recvmsg+0x80/0x90
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.994020] ? SyS_ioctl+0x79/0x90
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.994028] ? do_syscall_64+0x73/0x130
Jan 6 11:58:06 ovsdl-DGX-Station kernel: [ 1571.994035] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jan 6 11:58:18 ovsdl-DGX-Station kernel: [ 1584.341120] watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [gnome-shell:3201]
Jan 6 11:58:18 ovsdl-DGX-Station kernel: [ 1584.341123] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc aufs overlay iptable_filter nls_iso8859_1 binfmt_misc snd_hda_codec_hdmi intel_rapl sb_edac x86_pkg_temp_thermal coretemp kvm snd_hda_codec_realtek eeepc_wmi irqbypass crct10dif_pclmul snd_hda_codec_generic snd_seq_midi asus_wmi crc32_pclmul snd_seq_midi_event sparse_keymap ghash_clmulni_intel video intel_wmi_thunderbolt wmi_bmof mxm_wmi pcbc input_leds joydev snd_rawmidi snd_hda_intel aesni_intel aes_x86_64 crypto_simd snd_seq glue_helper snd_hda_codec cryptd snd_hda_core intel_cstate snd_hwdep intel_rapl_perf snd_pcm snd_seq_device snd_timer mei_me