Unable to determine the device handle for GPU0000:06:00.0: Unknown Error

nvidia-smi
Unable to determine the device handle for GPU0000:06:00.0: Unknown Error

lspci | grep NVIDIA
06:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
06:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
07:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
07:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
0d:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
0d:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
0e:00.0 VGA compatible controller: NVIDIA Corporation Device 2204 (rev a1)
0e:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)

nvidia-debugdump --list
Found 4 NVIDIA devices
Error: nvmlDeviceGetHandleByIndex(): Unknown Error
FAILED to get details on GPU (0x0): Unknown Error

dmesg |grep NVRM
[23027.545128] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 530.30.02 Wed Feb 22 04:11:39 UTC 2023
[1722067.120483] NVRM: GPU at PCI:0000:06:00: GPU-eb5e2cad-fa34-7cfb-976a-ab30da2e1f0c
[1722067.120502] NVRM: Xid (PCI:0000:06:00): 79, pid=‘’, name=, GPU has fallen off the bus.
[1722067.120509] NVRM: GPU 0000:06:00.0: GPU has fallen off the bus.
[1722067.120525] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloade

bug repor
nvidia-bug-report.log.gz (612.3 KB)

How did this happen and what can be done about it

This post offers a range of causes and checks that can be done.

But all four of my Gpus are rev a1