UnaUnable to determine the device handle for GPU

Hi,

I have four NVIDIA GEFORCE RTX2080TI and the GPU number 1 often goes out of the bus. When I run nvidia-smi I get the following message: Unable to determine the device handle for GPU 0000:0B:00.0: Unknown Error.

I checked the temperature and power before the failure and I got nothing out of the usual.
The nvidia-bug-report.log is here: nvidia-bug-report.log (2.3 MB)

The GPU giving this error is always the same, even though sometimes it is not even running.

My OS is opensuse leap 15.2

Hope you can help, thanks in advance.

It’s fallen off the bus, completely turned off. Please try reseating it in its slot, check if it works reliable in another system.