Freezes and hard resets on Ubuntu 20.04, kernel=5.11.40 drivers=470.82.00

Hi,

I wanted to report that since ubuntu auto updates installed kernel 5.11.40 and nvidia 470.82.00 drivers, I began experiencing problems when running my machine that was working fine with 470.63.00 drivers. I’m experiencing two different symptoms:

  • When running 2 4k displays, I was seeing complete freezes that never recovered, and they would happen less than ten minutes into running a 3d application like gazebo.
  • When only running 1 4k display, I was seeing handom resets (completely ranom reboot) after about 10 minutes of running a 3d application like gazebo.

I also experienced a hard reset when running a memtest, but I don’t think that memory is the issue since this happened immediately after rebooting after auto-upgrades to these drivers. I tried rolling back kernel to no avail, but running with nouveau drivers has worked thus far and I have been able to run on one display for well over an hour with no problems.

I’d like to grab a dump of information when it happens, but nothing of value seems to be present in the kernel log nor in system journal.

Here is a dump of lshw: computer description: Desktop Computer product: ROG STRIX G15DK_G15DK - Pastebin.com

Thanks for any help or pointers. If nothing else, maybe this will serve as an issue that may help someone who is having these problems but not posting and informing nvidia.

Thanks!

Forgot to add, lspci shows:

0b:00.0 VGA compatible controller: NVIDIA Corporation Device 2484 (rev a1)
Which should correspond to an ASUS GeForce RTX 3070

Please check your system logs for NVIDIA(GPU-0): WAIT entries.

I am experiencing freezes too,
Display freezes: (EE) NVIDIA(GPU-0): WAIT.

For me the log entries matched tbe freezes perfectly.

Do you also have these log entries?

@MatthijsBurgh Thanks for replying.
I do not see that in my journald output or my kernel logs, however 5 minute freezes after boot sounds familiar… we eventually started seeing hard resets, and I have moved away from the nvidia drivers ever since.

The last drivers to work for me were 470.63, if that helps at all… I haven’t rolled back to them.

Hope this helps.

Your problem description rather sounds like hardware issues, a spontaneous reset is only initiated by the mainboard. Maybe the psu is failing?

I am having the same experience. UBUNTU 20.04 and nvidia-driver-495 for an RTX 970.

The set up crashed on the 5.11.0-27 kernel, worked OK on the 5.11.0-38 kernel and then began spontaneous rebooting on the 5.11.0-40 kernel.

The unit passed a memtest, I ran gpu_burn for two minutes and it passed that. The no crash logs or kern.log errors…internet points to the PSU.

I think that 5.11.0-38 might contain a fix…although I am not sure what it would be if it does.