Hello all,
I have a problem with my second GPU, and Googleing has brought me here. In short: my second water cooled GTX 1080 is lost by Ubuntu nvidia-smi
(or at least marked as lost) after either seconds or minutes in the Ubuntu desktop.
What I have already done:
- reseed both cards
- replug the pci power cables (I found that the bottom one was not plugged in all the way)
The card is not overheating as nvidia-smi
is reporting around 20 degrees Celcius. I was wondering if something else (like the memory or power delivery phases) could overheat.
I believe this is not a novel problem, but I have no idea what else to tell you, so please let me know. I went through the nvidia-bug-report.log, but I could not find anything interesting on my own. Any help would be appreciated.
bug-report.gz (252.9 KB)
edit: my machine did run (several days or weeks) with the power cable of the second card not plugged in all the way. I did not notice when the second GPU failed since I wasn’t using it in this period