Unable to determine the device handle for GPU 0000:19:00.0: Unknown Error

OS: Ubuntu 18.04.4 LTS
Driver Version: 515.57
GPUs: 3 x RTX8000

Last week, when I use my machine for deep learning experiments, the GPUs often get crashed, but the temperature has been below 81 degrees Celsius when training. Then when I type nvidia-smi , there is an error Unable to determine the device handle for GPU 0000:19:00.0: Unknown Error . This is the output of nvidia-debugdump --list ,:

Found 3 NVIDIA devices
Error: nvmlDeviceGetHandleByIndex(): Unknown Error
FAILED to get details on GPU (0x0): Unknown Error

Here is the detailed info of bug report.
nvidia-bug-report.log.gz (312.0 KB)

I have no idea how to solve the problem.Can somebody help me? Thanks a lot!

XID 79, either lack of power or overheating.