For the past several months, I’ve had intermittent problems with the GPU “falling off the bus”, almost always during videoconferencing and sometimes after days of use. I haven’t been able to consistently reproduce the problem in Linux and haven’t been able to replicate it so far in Windows by running resource-intensive tasks (as I do not use it regularly enough for work to stumble upon it). Unfortunately, my machine only has one suitable PCIe slot and I don’t have another GPU to quickly try and narrow down on hardware issues. The bug report should be attached below. Any help in narrowing down or fixing what’s going on here would be greatly appreciated.
nvidia-bug-report.log.gz (189.4 KB)
There’s something weird going on with your system, you created the log right after the gpu shut down but lspci shows it’s on again. On previous boots, the system clock jumps backwards, e.g. you
boot at Apr 18 19:13:32
and the crash occurs Apr 18 16:20:26
reboot Apr 18 15:59:41
crash Apr 18 15:12:15
reboot Apr 18 19:13:32
crash Apr 18 16:20:26
So either you have a flux capacitor built into your system or there’s something wrong with the psu or even the mainboard. My first guess would be the psu is failing.
Oh no, I’d missed that when skimming the logs. That’s pretty surprising to me since the system is only a bit more than a year old and nothing has come up running diagnostics from BIOS. Do you have any advice on how to verify this?
Since the 1650 is bus-powered only, I’d simply start by reseating it in its pcie slot, even more than one time. Maybe its causing short circuiting.
If that doesn’t help, it’s getting difficult, you can only test one part after another by checking it in another system.