Unable to determine the device handle for GPU0000:XX:00.0: Unknown Error

Hello, I have a server with eight H100s. I use nvidia-smi and the error show up.
I try nvidia-smi -L and got this.

GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-6d68d915-00d5-f0f8-b1ef-5aeb15f53561)
GPU 1: NVIDIA H100 80GB HBM3 (UUID: GPU-33b92d06-4a26-19b0-2260-56334e20631c)
GPU 2: NVIDIA H100 80GB HBM3 (UUID: GPU-85aadbdb-50d4-242e-7950-e5955bbecda1)
GPU 3: NVIDIA H100 80GB HBM3 (UUID: GPU-93cfd6db-ee4c-a8a0-91c1-8195ac788108)
Unable to determine the device handle for gpu 0000:9B:00.0: Unknown Error
GPU 5: NVIDIA H100 80GB HBM3 (UUID: GPU-cde9c06f-3a70-ddd2-d1ca-65a6de91650a)
GPU 6: NVIDIA H100 80GB HBM3 (UUID: GPU-ef754bec-c0b9-af2b-fddc-21d793423402)
GPU 7: NVIDIA H100 80GB HBM3 (UUID: GPU-3b9e6356-b8ea-e21a-367d-cbac63ba8bdc)

I also try nvidia-debugdump --list. Here is the output.

Found 8 NVIDIA devices
        Device ID:              0
        Device name:            NVIDIA H100 80GB HBM3
        GPU internal ID:        1654823010963

        Device ID:              1
        Device name:            NVIDIA H100 80GB HBM3
        GPU internal ID:        1654823051277

        Device ID:              2
        Device name:            NVIDIA H100 80GB HBM3
        GPU internal ID:        1654823048663

        Device ID:              3
        Device name:            NVIDIA H100 80GB HBM3
        GPU internal ID:        1654823051883

Error: nvmlDeviceGetHandleByIndex(): Unknown Error
FAILED to get details on GPU (0x4): Unknown Error

I reboot the server last night and nvidia-smi works good. This morning I try nvidia-smi. The error show again. How can I recover the error without reboot?
Here is my nvidia-bug-report.log.
nvidia-bug-report.log.gz (4.8 MB)

NVRM: Xid (PCI:0000:9b:00): 79
Please check power connectors, monitor gpu temperatures.

OK, I will monitor the GPU temperatures.