We have a deep learning machine with 4 x 2080ti GPUs on Ubuntu 16.04.6. Recently one or more GPUs will disappear and reappear from nvidia-smi. This happens on the order of seconds, constantly flickering in and out of nvidia-smi.
Any suggestions?
nvidia-bug-report.log.gz (2.6 MB)