Ubuntu 18.04.3 with Kernel 5.x Suddenly Powers Off with GeForce GT 650M Laptop

I could use some help troubleshooting the unexpected system crashes on my HP dv6-7214nr laptop that has a GeForce GT 650M. It powers off completely and I don’t really know how to identify why. I do not suspect any thermal issue as it sometimes crashes without any applications open and all the air vents have been thoroughly cleaned. Here’s some system info and a link to my nvidia-bug-report.log.gz. Any help would be much appreciated!

user@localhost:~$ uname -a
Linux localhost 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

https://www.dropbox.com/s/z1bmrt2vueviz96/nvidia-bug-report.log.gz?dl=0

Before I forget, I installed the NVIDIA drivers from a fresh Ubuntu 18.04.3 installation as follows:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda-drivers

I removed step 3 from above (sudo apt-get dist-upgrade), but the system likewise crashed.

user@localhost:~$ uname -a
Linux localhost 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:12:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

https://www.dropbox.com/s/tnbm7gqlnpbc752/nvidia-bug-report2.log.gz?dl=0

First, is your system compatible with the basic nouveau driver? And if so, does the crashes happen there as well? – as in, does it also happen without the NVIDIA driver installed?

I’d begin to suspect either hardware failure, or something kernel related, if the system hard crashes at random times, even without any load.
To test if it’s the kernel, you could, for example, try a newer kernel version (stable/5.4) and see if the problem persists. If that does solve it, then it’s almost guaranteed to be something to do with the kernel side of it all.

I’m not sure if Ubuntu’s (non-LTS) Live-CD comes with the NVIDIA proprietary drivers, but if it does, then that’d be an option to try. If not, then maybe creating a bootable USB would be an option, if not outright installing it to the drive directly.

For the bootable USB option: https://tutorials.ubuntu.com/tutorial/tutorial-create-a-usb-stick-on-ubuntu

My laptop works just fine, i.e., no crashes, with the basic nouveau driver that ships with the fresh Ubuntu installation. This lead me to believe it isn’t a hardware failure but something related to the kernel. Last night I tried (but with the same crash results) the latest kernel available from the default Ubuntu 18.04.3 repositories using apt-get, i.e., Kernel 5.3, by inserting the following between steps 3 and 4 above:

sudo apt-get -y install linux-generic-hwe-18.04-edge linux-headers-generic-hwe-18.04-edge linux-image-generic-hwe-18.04-edge

user@localhost:~$ uname -a
Linux localhost 5.3.0-24-generic #26~18.04.2-Ubuntu SMP Tue Nov 26 12:34:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

https://www.dropbox.com/s/2g6p7z67jrqetb5/nvidia-bug-report3.log.gz?dl=0

I’ll now try the NVIDIA proprietary drivers provided from Ubuntu instead of directly from NVIDIA. Do those work equally well with CUDA using docker-ce?

Btw, did I get something wrong or isn’t 18.04.3 part of LTS?

I don’t use Ubuntu (or even CUDA at the moment) myself, so I’m not sure about that part.
And indeed, 18.04.3 is the latest LTS release. That was just for the example, of trying a Live-CD with a newer kernel. I apologise for the confusion.

Taking a quick look at the logs, there isn’t anything that immediately stands out. Though, admittedly, I’ve only taken a quick look.
Still, if nouveau works fine and doesn’t crash the system, and trying a newer kernel doesn’t solve it, then it does seem like it could well be driver related. However, I’m not sure what else to try at the moment. Sorry.

Hopefully someone else with some real experience tracking down issues like these will take notice of the post, and provide some real help.
Again, sorry for the confusion, and the lack of help.

This time I tried replacing line 11 above with the following but still with the same crash result:

sudo apt-get -y install cuda-drivers=418.87.01-1

I tried this because I couldn’t find my GeForce GT 650M’s 0FD1 Device-ID in the cuda-drivers=440.33.01-1 package.

Note:

lspci -nn
apt-cache policy cuda-drivers
apt-cache show nvidia-driver-440
apt-cache show nvidia-driver-418

Quick update. I just tried installing Windows 10 Pro and didn’t experience any sudden power off. I installed Windows really so I could install the UEFI hardware diagnostic suite from HP, which I ran and everything passed, also without any sudden power off.