I was using 4 GPUs for deep learning.
One day “Unable to determine the device handle for GPU 0000:0A:00.0: GPU is lost. Reboot the system to recover this GPU” message poped up,
and I couldn’t find my 4th GPU after reboot…
I’ll attach system log of nvidia-bug-report.sh, please help…nvidia-bug-report.log.gz (717.8 KB)