Hello,
With the latest version of NVidia drivers on Ubuntu 18.04 and an up-to-date system, my GPU 0 (workstation with 4 2080ti) has issues and cannot be used. The 3 other cards have no problems.
In nvidia-smi, it is marked with an error (nvidia-smi takes fez seconds to print its result instead of instantly as usual) and it cannot be used for any tasks (such as deep learning). The issue happens no matter if the monitor is connected to this GPU or not. In case it’s connected to this GPU, nothing is displayed and the screen remain black after boot.
I am not sure if it’s an hardware problem or a driver problem. I reinstalled the system and tried with 2 kernel versions (HWE kernel 5.0.0.23 and 5.0.0.31) and the issue happens in all cases. As all our other systems with the same configuration work perfectly, I guess it’s an hardware issue but I would like to know more about this and find how I could diagnose this issue by myself in the future.
I’ll attach the nvidia-bug-report result to this post, hoping it will be useful. Thank you for your time.
Best regards,
– Gauthier
nvidia-bug-report.log.gz (99 KB)