GPU "falling off the bus" at suspend resulting in crash on unsuspend nvidia 440-5.4.0.37

I had two or three days of suspend working perfectly, but now it consistently crashes the computer.

suspending on console (tty3) without X running works, but the crash happens when I switch back to X windows after un-suspending. (with Ctrl-alt-1)

I see errors on the console that relate to the pcibus and dmesg contains a reference to “GPU has fallen off the bus”

The nvidia bug report is attached.nvidia-bug-report.log (1.3 MB)

The following update may be implicated.

Start-Date: 2020-06-12  06:44:50
Commandline: /usr/bin/unattended-upgrade
Install: linux-modules-nvidia-440-5.4.0-37-generic:amd64 (5.4.0-37.41, automatic
)
Upgrade: linux-modules-nvidia-440-generic-hwe-20.04:amd64 (5.4.0-26.30+2, 5.4.0-37.41)
End-Date: 2020-06-12  06:45:06

Seems you also recently updated the system bios, was this at the same time the XID 79 appeared?
Please check if the kernel parameter
intel_idle.max_cstate=1
prevents the gpu falling off the bus.

I did update the bios, but that was after these crashes started

Just before I started getting the problem, I’d had a problem with bluetooth, and I’d had to restart the system several times. The bluetooth problem seemed to be corrected, but on the first suspend after the bluetooth started working, it wouldn’t come out of sleep, so initially I thought it was a bluetooth issue. But looking at the output of dmesg, it seems more likely to be a GPU issue.

James

Ok. Since the 440.82 is already more than 2 months old, I suspect you got the update because the kernel got updated. Please check in grub menu if you can boot an older kernel to check if this is a kernel regression.

I added the boot parameter, but there was no change.

Here is the dmesg logdmcrash.txt (92.0 KB)

Will try with older kernel. and report back

With an older kernel I had exactly the same crash.

I only had two kernels (this is a new computer) 5.4.0-26 and 5.4.0-37 Same crash/hang on resume with both.

In all cases I can resume to a non-graphical terminal, but hang when switching to gdm

I get the same errors on dmesg
dmcrash.txt (90.3 KB)

Update:

using the kernel parameter pcie_aspm=off, I have been able to suspend and unsuspend the computer. Following the suggestion at https://askubuntu.com/questions/868321/gpu-has-fallen-off-the-bus-nvidia

I’m somewhat concerned about what the side effects of this is, is the only effect slightly lower battery life?

Yes, only side-effect is slighly higher power consumption, otherwise safe to disable. Many notebooks have it disabled by bios anyway.

1 Like

I have been having the same problem, but pcie_aspm=off did not fix it (in fact, I think it might be being ignored entirely). Are there other possible fixes?