Arbitrary Crashes / Segfaults with RTX 3070 on current driver-455 on Ubuntu 20.04 kernel 5.4.0-58-generic

@generix thank you for asking. Unfortunately it didn’t improve. I tried to narrow it down, but it seems sufficient to run chrome as the only application to make it crash. I couldn’t narrow it down onto a specific page I have opened yet, though.

In the meantime I installed the WSL2 Linux subsystem in Windows, so I can work there as well and as expected I had not a single crash there, ever.

I was wondering; Does a Xid62 error guarantee that there is definitely a thermal problem? Or is this just a possibility when this specific error is logged? Because if it was guaranteed, it could point to a buggy GPU fan steering under linux where the fan is throttled below a certain rpm where it shouldn’t. On the other hand - I don’t know if the fan had to spin at all when power consumption is down to 23Watt as mentioned above and usually I don’t get above that while working.

XID 62 is one of the more inconclusive errors that can be logged. might be anything like broken gpu, general bios/pcie incompatibility, thermal problems. I guess thermal problems would only arise in high-load situations and not in your plain desktop usage.
You could check if setting pcie gen to 3 in bios or upgrading bios changes anything.

Hi,

I’m having problem with my NVIDIA 3070 too. On Windows and on Ubuntu.
When I’m playing on Windows the game breaks and I have to logout or restart de PC. Besides that, sometimes when I start the PC, I got this screen tearing.

On Ubuntu, the video stops and freezes. Last time I logged in PC via SSH and got this syslog error:

Feb 25 15:45:04 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe453d, 0x00008054, 0x000088b8)
Feb 25 15:45:06 ronneesley-pc-super systemd[11330]: tracker-extract.service: Succeeded.
Feb 25 15:45:07 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe5c78, 0x00008054, 0x000088ec)
Feb 25 15:45:14 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe5c78, 0x00008054, 0x000088ec)
Feb 25 15:45:17 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe453e, 0x00008054, 0x00008920)
Feb 25 15:45:24 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe453e, 0x00008054, 0x00008920)
Feb 25 15:45:26 ronneesley-pc-super tracker-store[11431]: OK
Feb 25 15:45:26 ronneesley-pc-super systemd[11330]: tracker-store.service: Succeeded.
Feb 25 15:45:27 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe5c79, 0x00008054, 0x00008954)
Feb 25 15:45:34 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe5c79, 0x00008054, 0x00008954)
Feb 25 15:45:37 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe453f, 0x00008054, 0x00008988)
Feb 25 15:45:44 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe453f, 0x00008054, 0x00008988)
Feb 25 15:45:47 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe5c7a, 0x00008054, 0x000089bc)
Feb 25 15:45:54 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe5c7a, 0x00008054, 0x000089bc)
Feb 25 15:45:57 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe4540, 0x00008054, 0x000089f0)
Feb 25 15:46:04 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (1-S, 17, 0xe4540, 0x00008054, 0x000089f0)
Feb 25 15:46:07 ronneesley-pc-super /usr/libexec/gdm-x-session[3044]: (WW) NVIDIA(0): WAIT (2-S, 17, 0xe5c7b, 0x00008054, 0x00008a24)

My NVIDIA driver is 460.39, so the problem persists @aplattner. I noticed your request about the bug report, then here is my log:
nvidia-bug-report.log.gz (357.8 KB)

Although of NVIDIA crashes when I’m playing on Windows, on Ubuntu the crashes happens when I’m doing normal use, like editing a text.

I already changed the NVIDIA card to another PCI-e slot and change the HDMI cable, but the problem doesn’t go away.

Finally, I noticed other people with same kind of problem at:

I hope this will help another people with this kind of problem, telling them, that they are not alone.

Thanks.

In the logs was only one XID 79, reasons might be overheating, lack of power or general gpu failure.
Since during taking the logs the gpu was idle, yet at 50°C, you should check fans and airflow, also monitoring temperature using nvidia-settings or nvidia-smi.