Ubuntu16.04 with 1080Ti x 4, encounters this problem once every week. <Unable to determine the device handle for GPU 0000:05:00.0: GPU is lost>

Hi, This error happens about once every week, and we need to reboot the computer to restore the GPU. We can’t find out what is going wrong with this machine. we have generated the log by using nvidia-bug-report.sh. Could it be a problem with the cooling fan? the power cord? the PCI slot? the system ram? or the processor itself?
Hope someone can help us with this issue.
Thanks very much.nvidia-bug-report.log.gz (2.6 MB)