I installed a GTX 1060 in my server, mainly for Plex transcoding and maybe some Data Science.
Card works fine, I use it in docker so installed the docker runtime. I can start the hello world container which runs fine.
I can also execute nvidia-smi and see the correct output.
But after some time, the attached monitor doesn’t work anymore (no signal). nvidia-smi hangs. If I try to update the plex container (via docker-compose), it also hangs and the nvidia-runtime process uses 100% cpu. Only a reboot helps after that. But even then, logging in via SSH and doing a reboot disconnects me and all services on the server stop, but I end up having to press the reset button.
I first used the i7-7700k internal GPU (1060 as headless). Similar effect, it broke my Xorg process but at least the monitor still got a signal. Now I connected the monitor to the 1060.
Not a lot to go on… I’m a bit lost where to start looking for logs / debugging.
The question mark in the title is really because I don’t know what actually causes the GPU to hang.
Any tips ?