Random Xid 61 and Xorg lock-up

Updated post with ur suggested changes with the exception of MSI. The user marvinreza reported being on an MSI x570 board with this problem.

Please note, I have a completely different CPU (i7-3770)…

Possible this guy has the same problem on windows 10:

Wrong statement… :/

Not entirely true - I’ve seen many reports, including myself, where card is 20xx.

I’ve reported the issue in this topic in 3rd post on the 1st page, it was back in August 2019.

Since then I’ve replaced Ryzen 3700 to 3900 - it changed nothing.

But now the issue in my case is - I hope - gone - I made 40+ days uptime twice in last 3 months without the issue, I’ve been playing games, watching videos, doing videos and pictures edits, etc. etc. I’ve stressed the system a lot, I even was digging some cryptocoins to make GPU work @100% during longer periods of time.

What did help? Don’t know :) Probably the bios update plus perhaps some system updates (Debian 10).

At this moment I’m using:

  • NVIDIA 440.36 driver (standalone binary).
  • BIOS Version 1201 for ROG STRIX X570-E GAMING (there are 2 newer versions but I’ve not tried them since my system seems stable).

If the problem returns I will report back here.

Don’t know if it’s related but now my Ryzen 3900 is working in idle @2.2 GHz per core, according to /proc/cpuinfo. Before bios update it was way higher all the time.

@amuederjoe You are seeing xid 61 on intel arch?

Hmm maybe I’m not affected by the bug. I’m very sorry it seems like i read too fast :/
Internal micro-controller halt, not Warning/BreakPoint.

[ 103.736704] NVRM: GPU at PCI:0000:01:00: GPU-aafe2c58-7930-a2f7-c6e4-62eddc0cc969
[ 103.736708] NVRM: GPU Board Serial Number:
[ 103.736714] NVRM: Xid (PCI:0000:01:00): 62, pid=701, 203c(3090) 00000000 00000000

I’m also seeing Xid 61 error in the logs, but since my computer is from 2007, could it be that there is some compatibility issues with the old motherboard?

Only the nouveau driver works with my 1660ti. I have tried all the other nvidia drivers in Manjaro and I get stuck on a black screen on every boot. My older gtx1050 works just fine.

Before I dive in to the logs any further, do you think that the 1660ti would even work with these specs or am I literally trying to revive a dead horse :)

Intel Q6850 processor
Gigabyte GA-P35T-DQ6 Motherboard
4GB DDR3 RAM @1333MHz
Corsair 650W power supply

I’ve just ordered a non Nvidia Turing based video card today.

In my last post, I described running my desktop in a vncserver instance so that all applications were running non hardware accelerated and minimizing the use of the video card. The only things using the video card were ubuntu mate 18.04 desktop and vncviewer connecting to localhost:5901.

I had XID 61 happen again Friday Jan 10th. What’s interesting this time around was the error happened and I didn’t notice. At 16:22 the error was logged while I was interacting with the desktop, but no slowdown happened. I attribute this to vncviewer not utilizing much if any hardware acceleration. I locked the screen at 16:30(ish) and when I returned a few minutes later, I wasn’t able to wake the screen. That’s when I logged in remotely via ssh and looked at kern.log to verify XID 61. When I attempted to wake the screen, I got a “Lost display notification,” message in the log.

[Fri Jan 10 16:22:52 2020] NVRM: GPU at PCI:0000:08:00: GPU-9c1e2d3f-5bf1-9e58-dbcb-9350c03802bb
[Fri Jan 10 16:22:52 2020] NVRM: GPU Board Serial Number:
[Fri Jan 10 16:22:52 2020] NVRM: Xid (PCI:0000:08:00): 61, pid=1790, 0cec(3098) 00000000 00000000
[Fri Jan 10 16:35:49 2020] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

I have a vanila ubuntu mate install, the only change I made to address visual tearing was to add ForceFullCompositionPipeline=On to X’s config.

Does anyone know what “Internal micro-controller breakpoint/warning” means? Using my imagination, this sounds like a developer has put a breakpoint in the code and we happen to be hitting that code. If that’s the case, shouldn’t be to hard to find if one was to do a search in the source code.

Asus Prime X570 Pro / Ryzen 3900X / Asus RTX 2060 Super / Ubuntu Mate 18.04

I was hit by this again after 3.5 days of uptime, and once again it occurred while using a browser. This time I was switching tabs. I let the computer sit while I got some food. The screensaver activated, and being that it uses OpenGL it was of course horribly slow. However when I unlocked it I noticed that my main browser (Opera) was not affected by the slowdown anymore. Neither was the Unity editor I had running. My second browser (Vivaldi) and Discord client were sluggish as hell though.

Hmm maybe this is browser related. fwiw google chrome is also spinning a full core on my cpu when this generally occurs. Granted DM and chrome use the gpu, but if chrome was the source of causing the problem to repro… If I kill the chrome, the problem doesn’t go away, but that doesn’t mean it does not initiate it.

Had another incident last night after uptime of 7.5 days. Have a lot more diagnostic data to dump this time. I used strace on both xorg/google chrome to see if they printed out anything interesting and I didn’t really see anything bizarre other then lots of yields and async calls not ready. I then ran nvidia-smi which might be a lot more interesting. I’ve attached the log and it seems like a number of values error out when trying to be fetched (when under normal conditions they work). After a force restarted of lightdm, nvidia-smi failed to work any longer. The forced restart also allowed me to to capture a stack trace (attached) generated in the nvidia kernel module which may be of use. I’ve also attached my xorg log which contains log messages about a client bug related to some event 17 as well as a bunch for dp disconnect messages (which i’m not sure is normal).

@amrits maybe take a look at some of this stuff and see if anything stands out too you?

FYI: also attached nvidia-smi with a normal output.
nvidia-smi.log (5.99 KB)
xorg.0.log (64.6 KB)
journal.log (3.55 KB)
nvidia-smi-norm.log (6.35 KB)

Had another lock up. @amrits was that information of any use to you? I will be switching off the RTX card for now and going back to an older card. The lock ups are too disrupting. I suspect that this issue is actually triggered by some unique code path in the render pipeline used by hardware acceleration in browsers (games and media as well). I’d be interested to know if any one had a lock up that was not triggered while they had some unique application using hardware acceleration. I mainly develop in a terminal and use a browser for music/documentation purposes, nothing ever to demanding when I lock up.

The Xid occurred yet again after 2 days and 21 hours of uptime. And once again I was doing stuff in Opera.
Since it’s weekend and I don’t expect amrits to be able to investigate right now, I did a few experiments in the spirit of “turn it off and on again” to see if I could clear the issue by reinitializing the hardware.

Experiment 1: Power cycle monitors. This has previously cleared the problem of a DP monitor not being correctly recognized; what if the entire issue was being caused by a stuck state in the hardware controlling the monitor? It triggered some GpuWatchdog segfaults from Opera and caused the browser to recover, but other processes were still affected.

GpuWatchdog[2545]: segfault at 0 ip 000056179a5ba805 sp 00007f6ac2aa7490 error 6 in opera-beta[5617966bc000+6438000]
Code: ba 03 00 00 00 b9 04 00 00 00 41 b8 01 00 00 00 e8 70 df ab fe 48 89 c7 48 89 05 ee 2f c3 02 48 8b 07 be 01 00 00 00 ff 50 30 04 25 00 00 00 00 37 13 00 00 c6 05 d9 2f c3 02 01 80 7d 87 00

Experiment 2: Put the system in suspend. It took a lot longer than usual to enter suspend mode, with Xorg eating 100% of one CPU core. Eventually it succeeded and when I woke it back up the Xid symptoms were gone! However I was now missing one of my monitors (crtc configuration errors as before) and power cycling it didn’t bring it back.

Experiment 3: Put the system in hibernation. This worked as expected, and to be safe I also used the physical power switches to power off both the computer and monitors entirely. After powering everything back up and resuming from the hibernation image, both monitors were working again and OpenGL applications were running at full speed.

As an update, after switching to a GT 1030 card for display (as reported by others) I haven’t had one instance of XID-61 and no display problems to think of.

I still plan on running some kind of CUDA workload on the RTX 2060 Super (I left the card installed.) I’ll report back if I’m able to reproduce the issue when not having it drive a display.

Asus Prime X570 Pro / Ryzen 3900X / Asus RTX 2060 Super / Ubuntu Mate 18.04 / MSI GT 1030

After the hibernate trick it took six days until the Xid occurred again. This time the X server locked up hard instead of the sluggish operation from earlier instances. Ssh worked but an attempt to reboot gracefully hung as well so I pressed the reset button. I did not fully power down the system.

Less than four hours later I had another Xid incident. This time Xid 61 was soon followed by 8:

Jan 25 18:57:45 muskrat kernel: NVRM: GPU at PCI:0000:09:00: GPU-5a2b009d-c14b-46e3-9865-3de04b4b0435
Jan 25 18:57:45 muskrat kernel: NVRM: GPU Board Serial Number:
Jan 25 18:57:45 muskrat kernel: NVRM: Xid (PCI:0000:09:00): 61, pid=2363, 0cec(3098) 00000000 00000000
Jan 25 18:57:56 muskrat kernel: NVRM: Xid (PCI:0000:09:00): 8, pid=2363, Channel 00000018

This also caused Xorg to lock up hard. Perhaps because I had not power cycled the hardware? I’ll have to try if hibernating instead of suspending for the night causes any improvement, though it may cause other problems with my NFS setup.

As another shot in the dark I limited the number of CPU cores to 6. No idea if this will have any effect.

Hello,

Since I’ve build a new PC I have the same problem.
Not a special setup.

Gigabyte Aorus x570 elite latest F11 BIOS
AMD Ryzen 3600
MSI Nvidia RTX 2060
Dual screen connected by 2x Displayport
Ubuntu 19.10
nvidia-driver-440 440.48.02-0ubuntu0~~19.10.1

The following is found in the logs.
It just happens two times in the last 3 weeks.

Jan 26 14:33:27 galahad kernel: [144971.020843] NVRM: GPU at PCI:0000:09:00: GPU-8bc30071-cb5b-c7cf-af07-11706f852ea8
Jan 26 14:33:27 galahad kernel: [144971.020846] NVRM: GPU Board Serial Number:
Jan 26 14:33:27 galahad kernel: [144971.020850] NVRM: Xid (PCI:0000:09:00): 61, pid=2052, 0cec(3098) 00000000 00000000
Jan 26 14:34:26 galahad /usr/lib/gdm3/gdm-x-session[2971]: (WW) NVIDIA: Wait for channel idle timed out.
Jan 26 14:34:29 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x0000fbb4, 0x0000fbbc)
Jan 26 14:34:29 galahad kernel: [145033.328209] NVRM: Xid (PCI:0000:09:00): 8, pid=2052, Channel 00000020
Jan 26 14:34:30 galahad kernel: [145034.596558] show_signal_msg: 2 callbacks suppressed
Jan 26 14:34:30 galahad kernel: [145034.596561] GpuWatchdog[6539]: segfault at 0 ip 0000564b1813aded sp 00007fc2306e1480 error 6 in chrome[564b141ff000+7171000]
Jan 26 14:34:30 galahad kernel: [145034.596567] Code: 48 c1 c9 03 48 81 f9 af 00 00 00 0f 87 c9 00 00 00 48 8d 15 a9 5a 9c fb f6 04 11 20 0f 84 b8 00 00 00 be 01 00 00 00 ff 50 30 04 25 00 00 00 00 37 13 00 00 c6 05 c1 6d a4 03 01 80 7d 8f 00
Jan 26 14:34:30 galahad systemd[2002]: Starting Notification regarding a crash report…
Jan 26 14:34:30 galahad update-notifier-crash[837]: /usr/bin/whoopsie
Jan 26 14:34:30 galahad systemd[2002]: update-notifier-crash.service: Succeeded.
Jan 26 14:34:30 galahad systemd[2002]: Started Notification regarding a crash report.
Jan 26 14:34:35 galahad systemd[2002]: Starting Notification regarding a crash report…
Jan 26 14:34:35 galahad update-notifier-crash[844]: /usr/bin/whoopsie
Jan 26 14:34:35 galahad systemd[2002]: update-notifier-crash.service: Succeeded.
Jan 26 14:34:35 galahad systemd[2002]: Started Notification regarding a crash report.
Jan 26 14:34:36 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x0000fbb4, 0x0000fbbc)
Jan 26 14:34:39 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x0000fbb4, 0x0000fbc4)
Jan 26 14:34:46 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x0000fbb4, 0x0000fbc4)
Jan 26 14:34:49 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x0000fbb4, 0x0000fbcc)
Jan 26 14:34:56 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x0000fbb4, 0x0000fbcc)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) Backtrace:
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x55698d1ebacc]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7fe90b34559f]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 2: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x71f39) [0x7fe90a32a4d9]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x709d7) [0x7fe90a327237]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 4: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x73ca5) [0x7fe90a32e095]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 5: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x73e79) [0x7fe90a32e4a9]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 6: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x69e9f) [0x7fe90a31a4bf]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 7: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x6e504) [0x7fe90a322f04]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 8: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x6a887) [0x7fe90a31b897]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 9: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x70b2a) [0x7fe90a32738a]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 10: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x8eaa3) [0x7fe90a363a13]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 11: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x66479) [0x7fe90a312af9]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 12: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x45d55e) [0x7fe90ab0127c]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) 13: ? (?+0x0) [0x0]
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) Segmentation fault at address 0x80
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: Fatal server error:
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) Caught signal 11 (Segmentation fault). Server aborting
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: Please consult the The X.Org Foundation support
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: #011 at http://wiki.x.org
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: for help.
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE) Please also check the log file at “/var/log/Xorg.1.log” for additional information.
Jan 26 14:35:08 galahad /usr/lib/gdm3/gdm-x-session[2971]: (EE)
Jan 26 14:35:15 galahad kernel: [145079.453375] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Jan 26 14:35:23 galahad kernel: [145087.453459] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

I confirm the same problem. System locks-up with ultra sluggish input and high cpu xorg.

OS : Ubuntu 18.04
MB : Gigabyte X570 AORUS ULTRA
CPU : AMD Ryzen 9 3900X
GPU : Palit 2080 TI
driver : nvidia-driver-440

Jan 30 13:39:51 xxx kernel: [ 2740.694918] NVRM: GPU at PCI:0000:0a:00: GPU-5a995746-9836-2529-7692-2a9d80e4fb6c
Jan 30 13:39:51 xxx kernel: [ 2740.694921] NVRM: GPU Board Serial Number: 
Jan 30 13:39:51 xxx kernel: [ 2740.694925] NVRM: Xid (PCI:0000:0a:00): 61, pid=1900, 0cec(3098) 00000000 00000000

It seems to be getting more frequent, just had a couple of freezes a few hours apart. It doesn’t happen in response to any particular stress or application. It has happened with just firefox open or a single VirtualBox VM.

New Nvidia driver released version 440.59, if/when someone install this it would be great to confirm if we are still seeing these issues.

And another one…

Feb  4 14:23:33 shinpan kernel: [16214.818715] NVRM: Xid (PCI:0000:0a:00): 61, pid=1905, 0cec(3098) 00000000 00000000
Feb  4 14:23:33 shinpan kernel: [16214.818713] NVRM: GPU Board Serial Number: 
Feb  4 14:23:33 shinpan kernel: [16214.818711] NVRM: GPU at PCI:0000:0a:00: GPU-5a995746-9836-2529-7692-2a9d80e4fb6c

This time I ssh’d in and saw that along-side xorg hig cpu, I also had this process pegging the cpu:

irq/173-nvidia

@jm4games I’m not seeing 440.59 in the graphics-drivers ppa yet

NVRM version: NVIDIA UNIX x86_64 Kernel Module  440.48.02  Tue Jan 14 06:22:51 UTC 2020
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)