I could use some help troubleshooting the unexpected system crashes on my HP dv6-7214nr laptop that has a GeForce GT 650M. It powers off completely and I don’t really know how to identify why. I do not suspect any thermal issue as it sometimes crashes without any applications open and all the air vents have been thoroughly cleaned. Here’s some system info and a link to my nvidia-bug-report.log.gz. Any help would be much appreciated!
user@localhost:~$ uname -a
Linux localhost 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
First, is your system compatible with the basic nouveau driver? And if so, does the crashes happen there as well? – as in, does it also happen without the NVIDIA driver installed?
I’d begin to suspect either hardware failure, or something kernel related, if the system hard crashes at random times, even without any load.
To test if it’s the kernel, you could, for example, try a newer kernel version (stable/5.4) and see if the problem persists. If that does solve it, then it’s almost guaranteed to be something to do with the kernel side of it all.
I’m not sure if Ubuntu’s (non-LTS) Live-CD comes with the NVIDIA proprietary drivers, but if it does, then that’d be an option to try. If not, then maybe creating a bootable USB would be an option, if not outright installing it to the drive directly.
My laptop works just fine, i.e., no crashes, with the basic nouveau driver that ships with the fresh Ubuntu installation. This lead me to believe it isn’t a hardware failure but something related to the kernel. Last night I tried (but with the same crash results) the latest kernel available from the default Ubuntu 18.04.3 repositories using apt-get, i.e., Kernel 5.3, by inserting the following between steps 3 and 4 above:
I’ll now try the NVIDIA proprietary drivers provided from Ubuntu instead of directly from NVIDIA. Do those work equally well with CUDA using docker-ce?
Btw, did I get something wrong or isn’t 18.04.3 part of LTS?
I don’t use Ubuntu (or even CUDA at the moment) myself, so I’m not sure about that part.
And indeed, 18.04.3 is the latest LTS release. That was just for the example, of trying a Live-CD with a newer kernel. I apologise for the confusion.
Taking a quick look at the logs, there isn’t anything that immediately stands out. Though, admittedly, I’ve only taken a quick look.
Still, if nouveau works fine and doesn’t crash the system, and trying a newer kernel doesn’t solve it, then it does seem like it could well be driver related. However, I’m not sure what else to try at the moment. Sorry.
Hopefully someone else with some real experience tracking down issues like these will take notice of the post, and provide some real help.
Again, sorry for the confusion, and the lack of help.
Quick update. I just tried installing Windows 10 Pro and didn’t experience any sudden power off. I installed Windows really so I could install the UEFI hardware diagnostic suite from HP, which I ran and everything passed, also without any sudden power off.