GPU sporadically crashing with NVLink fatal error detected on link 0 Xid 74

nvidia-bug-report.loggz (428.4 KB)

This error arises in high load and low load conditions. It does not seem to be related to a temperatures problem.

System: Ubuntu 18.04, Cuda 10.2, GPUs: two Nvidia Geforce RTX 2070.

Did you already try to reseat the nvlink bridge, i.e. pull it off and plug it back in?

No, I didn’t had to. This issue got solved when I switched my desk to a sturdier one. It seems that the desk was very susceptible to vibrations even those which traveled through the floors.