Suddenly lost one of four GPUs

I am running ESXi 7.0 on the server with 4 Quadro RTX8000 GPUs. All GPUs are assigned to a VM running Ubuntu by passthrough. One day I found the 3rd GPU lost in nvidia-smi command and CUDA program also could not find it. The log only shows:

kernel: NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x25:0x65:1417)
kernel: NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 2
kernel: NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x25:0x65:1417)
kernel: NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 2
kernel: NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x25:0x65:1417)
kernel: NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 2
kernel: NVRM: GPU 0000:86:00.0: RmInitAdapter failed! (0x25:0x65:1417)
kernel: NVRM: GPU 0000:86:00.0: rm_init_adapter failed, device minor number 2

The configurations of the host and the vm should be OK because it has running normally for months. And I have double checked the BIOS settings of the host such as enabling 4G above decoding and UEFI boot; the settings of the vm such as pciPassthru.use64bitMMIO=”TRUE” and pciPassthru.64bitMMIOSizeGB=”256”. NVIDIA drivers have been tested from 460 to 515, and CUDA from 11.3 to 11.7. All suggested configurations I found in Google have been tested but not work.

Today I tried to boot from an external disk installed with Ubuntu 22.04 to avoid the interference of ESXi, but errors still exist and are same as above. The bug report of this external Ubuntu OS is attached.

Does this strange error is caused by hardware faults? Or anyone could help me?
nvidia-bug-report.log.gz (1.1 MB)