I’ve just built a new PC with a 3090, but when the gpu is put under any sort of significant load, the gpu often crashes.
Sometimes the screen freezes and/or goes black and becomes unresponsive, other times it freezes for a few seconds, then becomes really slow, then becomes unresponsive. Occasionally the screen will freeze but the computer will stay responsive enough for me to SSH into the machine and run a few commands.
One of the times that it froze, I was able to log in and run sudo nvidia-bug-report.sh
which hung:
nvidia-bug-report.log.gz (120.6 KB)
(I also ran sudo nvidia-bug-report.sh --safe-mode --extra-system-data
which was able to complete, but I’m not allowed to link more than 1 file per post…)
I’ve run into this problem with all of the nvidia drivers available on Ubuntu 22.10, and I also ran into the issue on Ubuntu 22.04. I don’t think it is due to overheating because I’ve had it crash when the GPU was <60 C and also when I set the power limit to be only 250 watts.
I’ve noticed that putting pressure on the card while it is running (e.g. with a slightly too large GPU stand) seems to change the behavior, so I thought it could be the PCIe slot. But the GPU is crashing even if I slot it in another PCIe slot on the motherboard.
Does anyone have and ideas what could be causing this? Is the GPU just bad?