390.132 Ubuntu 18.04.02 GPU has fallen off the bus (confirmed on 20.04)

Hardware : Alienware m18x r2 2 x GTX 675m. 24GB mem
OS : Linux 5.3.0-51-generic #44~18.04.2-Ubuntu SMP
OS : Linux 5.4.0-29-generic #33-Ubuntu SMP
Driver 390.132

Frequency of fault : constant. System unusable.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.132                Driver Version: 390.132                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 675M    Off  | 00000000:01:00.0 N/A |                  N/A |
|  0%   63C   P12    N/A /  N/A |     32MiB /  1977MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 675M    Off  | 00000000:07:00.0 N/A |                  N/A |
|  0%   57C   P12    N/A /  N/A |      0MiB /  1985MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
|    1                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Nvidia bug reports available here:

Please check for a bios update first.
In what kind of situation does this happen, high or low load, i.e. idle?
Please try setting kernel parameter
intel_idle.max_cstate=1
Please monitor temperatures in nvidia-smi while running an unigine demo.

BTW, with those gpus, you can use the current 440 driver for graphics.

Unfortunately attempting to install anything after 390 results in the nvidia driver installer stating my card is not compatible and also stating the latest compatible driver is 390. Although the confusion is understandable when Nvidia’s own download site states 410 is the latest. But alas it isn’t!

My BIOS is the latest already checked thanks!

I have been working on this problem for two weeks now and have done quite a lot of things. I started at 18.04.01 with kernel 4.18 (which had been stable for two years) and gdm3/Gnome Desktop to 18.04.02/4 with lightdm/Mate desktop.

I’m at the stage it probably needs to be a backup and complete clean reinstall. My decision as to 18.04 or 20.04.

One of my concerns is in the nvidia report log there are references to driver 390.116 which I don’t ever recall installing.

So maybe I have got mixed up drivers?!?!?

Sorry, my fault. Despite of being a GTX6xx, this is still a Fermi device so 390 is the correct and last driver supporting it.
You don’t have any driver problems, the XID 79 you’re getting is not driver related but HW. It simply shuts itself down. On notebooks, this is often related to bios/kernel flaws. Other possibilities are overheating or simply the gpu is broken.
So a simple reinstall won’t yield anything. Since this was not always happening, you should first check for overheating, then check an earlier kernel, e.g. by installing 16.04 and not updating so will run an older kernel. If both doesn’t change/clear the situation, the gpu is broken.

Probably not in this instance. The machine dual boots into Windows 10 with 390 and it works flawlessly all day long. As to over heating temps are all very low < 60! Since the driver and bios haven’t change in two years and works flawlessly under Win 10 I suspect a kernel update regression. So I agree with you and 'm going to go back to an older distro.

Can confirm with a fresh install of MATE 20.04 allowing the installer to add third party drivers the system builds correctly with support for the nvidia driver and can be booted and a login screen appears. Login is completely successful and the desktop is displayed correctly. I can successfully open a terminal window and confirm nvidia-smi is working and can even open nvidia-settings.
However the GPU locks instantly when just about any other window is opened.
Even though I can successfully open a terminal window if I try and use any of the menus the system will black screen.
Only access after this point is via ssh and rebooting is not possible because the Xorg task cannot be killed.
So a power cycle is required.