GPU has fallen off the bus

This error has been bugging me for months. Now it has gotten so worse that I can’t work anymore.
The device has 4 GPUs. I have reseated the GPUs about 6 months back and I believe it might have helped a bit or maybe it did not. I don’t think the power supply is an issue here cause it is connected directly to the power supply of the whole university (Little I can do to change that either).

uname -a
Linux lambda-quad 4.15.0-130-generic #134-Ubuntu SMP Tue Jan 5 20:46:26 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi
Unable to determine the device handle for GPU 0000:05:00.0: GPU is lost. Reboot the system to recover this GPU

lspci | grep -i nvidia

05:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
    05:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
    06:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
    06:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
    09:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
    09:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
    0a:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
    0a:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)

nvidia-bug-report.log.gz (2.6 MB)

In the logs, it’s always the same gpu failing and uncorrected pcie bus errors are reported.
Please check for a system bios update first. If that doesn’t yield anything, it might be that either the gpu or the slot (mainboard) is broken. To test, swap two gpus in their slot and observe if the failing pci id stays the same or changes.