Nvidia-smi loss one of four cards

I have 4 NVIDIA A10 24GB cards installed on Ubuntu 18.04.5 LTS with kernel version 4.15.0-128-generic. After a reboot, I found one of these card dropped in nvidia-smi. But it can be still detectable in lspci. Repeated reboot cannot solve this problem. I have collect the nvidia-bug-report.
nvidia-bug-report.log.gz (1.5 MB)

I don’t know how to fix this, if anyone can help me I’d really appreciate it.

nvidia-smi
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 On | 00000000:17:00.0 Off | 0 |
| 0% 39C P8 9W / 150W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A10 On | 00000000:25:00.0 Off | 0 |
| 0% 40C P8 9W / 150W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A10 On | 00000000:D9:00.0 Off | 0 |
| 0% 39C P8 9W / 150W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

sudo lspci | grep -i nvidia
17:00.0 3D controller: NVIDIA Corporation Device 2236 (rev a1)
25:00.0 3D controller: NVIDIA Corporation Device 2236 (rev a1)
c5:00.0 3D controller: NVIDIA Corporation Device 2236 (rev ff)
d9:00.0 3D controller: NVIDIA Corporation Device 2236 (rev a1)

The 4th gpu is off. Please power down the system, disconnect power, wait some time, reconnect power and boot. If the gpu doesn’t come alive again, it’s broken.