GPU randomly lost

Hi There,

We have a ubuntu 20.04 machine with 2 GPUs 2080 Ti and NVIDIA GPU NVIDIA GeForce RTX 3060 Ti on it and from time to time, one of the GPUS disappear.
nvidia-bug-report.log.gz (327.7 KB)

root@cpuhl00005:~# nvidia-smi
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error

root@cpuhl00005:~# lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev ff)
01:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev ff)
01:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev ff)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev ff)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 2489 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 228b (rev a1)

We solve the issues temporary reinstalling the nvidia drivers but after some time the issue will reappear.

Any help is appreciated

Best Regards

The 2080 is always shutting down with Xid 79. Please check temperatures and power.

hi,have you solved this problem ?