Random Xid 61 and Xorg lock-up

Hi guys,

Regarding the persistence thing, yep, I always do nvidia-smi -pm ENABLED.

I did try lower frequencies at first but found that there was a bit of stuttering in videos and laggyness in the desktop shell. I haven’t been monitoring the overall power consumption of the machine, but the GPU only went up a 5-10°C so I figured even though the clock was highish the card still wasn’t drawing too much current. Also, it’s getting cold now where I live and wasted heat from my PC is heating the room, so there is no overall energy wastage as my main heat source is a giant resistor in the form of an electric storage heater…

That being all said, I bit the bullet and upgraded to Ubuntu 20.10 today and am now running the nvidia 455.28 driver that comes packaged with Ubuntu 20.10. I’m leaving the GPU clocking in the adaptive state so it’s free to wander up and down the full frequency range. It’ll be interesting to see how stable things are over the next few day - I’ll report back on my findings.

So much for me talking about not experimenting on my main work machine :-)

Juan

@Uli1234 I should have mentioned that I do indeed enable Persistence Mode (per your suggestion in prior posts). Thanks so much for all of your help on this issue!

I’d suggest to do it the other way around: start with 1300 and work your way DOWN. Or, rather start with something like 800 to begin with.

@t.platzer I should have clarified that I would work my way up if Xid 61 errors continue to occur. On the other hand, if stability is immediately demonstrated (at 1300), I’ll start dropping down until I see Xid 61 again. Hopefully I can get as low as you have!

As a side note, my gpu fans rarely ever spin up (and usually only for a second or two), so I’m not terribly concerned about heat. I assume (perhaps naively) that this is due to the gpu being in an external enclosure.

Well now. That didn’t take long to happen with 455.28.

Oct 19 18:54:04 yaffle kernel: [26627.344744] NVRM: Xid (PCI:0000:05:00): 8, pid=3047, Channel 00000018

…followed by the youtube video stutter out in a loop, before the mouse stopped working, before the PC locked up.

Back to keeping the GPU at a fixed frequency for the time being!

For me the x61 is gone with driver v450.80.02 on ubuntu 20.10. Getting the xid 8 now more frequently (every 2-3 days) though. Enable pm and locking the frequencies still prevents it.


Now that the 30xx generation is out, I wonder if they have the same issues with the 30xx? Should I buy a 30xx? If the 30xx does not have these issues, is there a possibility to exchange the 20xx for a 30xx? If the 30xx does not have the issue it is partly a hardware problem?

I recently got an RTX 2080 and started experiencing similar apparent crashes on my Ryzen 3700x machine – rather getting worse than better with the latest drivers (450.80.02) provided by Linux Mint. The clock frequency trick didn’t help, and the desktop would get stuck even after the display went to sleep or I turned the screen off and back on. In my case it turns out that the fullscreen compositor is to blame. Refreshing it with a hotkey or from another machine brings the UI back to life:

nvidia-settings --assign CurrentMetaMode="nvidia-auto-select +0+0 { ForceCompositionPipeline=On }"

Just noticed there’s driver version 455.23.04 available for Mint. Updating to that seems to fix the compositor issue, at least based on very quick testing.

edit: another jam today, solved by refreshing the compositor.

just did update to ubuntu 20.10 still nvidia 450.80.02 still sudo nvidia-smi -pm ENABLED; sudo nvidia-smi -lgc 800,2000;

Hello,

My config:

MOBO: Asus Z170 Pro Gaming
CPU: Intel Core i7 6700K
GPU: GeForce GTX 1650 Super (with driver 455.38)
OS: Ubuntu 20.04.1 LTS (with kernel 5.4.0-54)
DE: Gnome 3.36.3

I recently upgraded my GC from a GTX 750 Ti to a GTX 1650 Super and since then, I’ve been struggling with the famous “XID 61” error too…

It happens randomly, sometimes once a day, sometimes much more. It seems to happen when using Chrome (open a tab, typing in the search bar, etc.) or a Chromium related application (VS Code, GitKraken, Slack, etc.).

For example, here is the log from one of my recent crash (error caught by code, which is the binary executable for the “VS Code” application):

Nov 19 00:08:53 localhost kernel: [ 3416.843254] NVRM: GPU at PCI:0000:01:00: GPU-89c59685-2db5-e28c-d31f-2a87067036d2
Nov 19 00:08:53 localhost kernel: [ 3416.843256] NVRM: GPU Board Serial Number: 
Nov 19 00:08:53 localhost kernel: [ 3416.843260] NVRM: Xid (PCI:0000:01:00): 61, pid=1789, 0cec(3098) 00000000 00000000
Nov 19 00:09:24 localhost kernel: [ 3448.376808] show_signal_msg: 1578 callbacks suppressed
Nov 19 00:09:24 localhost kernel: [ 3448.376810] GpuWatchdog[33454]: segfault at 0 ip 000055741426d439 sp 00007f7064723680 error 6 in code[557410c3d000+57ee000]
Nov 19 00:09:24 localhost kernel: [ 3448.376814] Code: 00 79 09 48 8b 7d c0 e8 45 3d c0 fe c7 45 c0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 c0 48 8d 7d c0 e8 97 50 9d fc <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
Nov 19 00:09:40 localhost update-notifier-crash[36628]: code
Nov 19 00:09:41 localhost /usr/lib/gdm3/gdm-x-session[2505]: (WW) NVIDIA: Wait for channel idle timed out.
Nov 19 00:09:44 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000089bc, 0x000089c4)
Nov 19 00:09:51 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x000089bc, 0x000089c4)
Nov 19 00:09:56 localhost /usr/lib/gdm3/gdm-x-session[2505]: (II) event4  - Logitech M545/M546: SYN_DROPPED event - some input events have been lost.
Nov 19 00:10:07 localhost kernel: [ 3490.788236] GpuWatchdog[36631]: segfault at 0 ip 000055741426d439 sp 00007f7064723680 error 6 in code[557410c3d000+57ee000]
Nov 19 00:10:07 localhost kernel: [ 3490.788240] Code: 00 79 09 48 8b 7d c0 e8 45 3d c0 fe c7 45 c0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 c0 48 8d 7d c0 e8 97 50 9d fc <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE)
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) Backtrace:
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x13c) [0x56050d64811c]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 1: /lib/x86_64-linux-gnu/libpthread.so.0 (funlockfile+0x60) [0x7ff72935041f]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 2: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x721a3) [0x7ff72832ac73]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 3: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x70c17) [0x7ff728327997]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 4: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x73f05) [0x7ff72832e835]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 5: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x740d9) [0x7ff72832ec49]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 6: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x6a0df) [0x7ff72831ac1f]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 7: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x6e744) [0x7ff728323664]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 8: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x6aac7) [0x7ff72831bff7]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 9: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x70d6a) [0x7ff728327aea]
Nov 19 00:10:08 localhost /usr/lib/gdm3/gdm-x-session[2505]: (EE) 10: /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so (nvidiaAddDrawableHandler+0x85407) [0x7ff728350e27]

So I landed to this helpful thread and, until a fix is released, I applyied the suggested workaround: maintain the the GC above the P5 state, using nvidia-smi -lgc 810,1815 (clocks depending on the specs of my GC).

Instead of manually running this command after every boot, I made it run automatically via a Systemd unit. For those interested in this solution, here are the steps:

  1. Create a shell script somewhere under your PATH (e.g. /usr/local/bin/nvidia-lgc.sh), with executable permissions:
 #!/bin/sh

 nvidia-smi -pm 1
 nvidia-smi -lgc 810,1815
  1. Create a Systemd unit (e.g. /etc/systemd/system/nvidia-lgc.service):
[Unit]
After=nvidia-persistenced.service

[Service]
ExecStart=/usr/local/bin/nvidia-lgc.sh

[Install]
WantedBy=default.target
  1. Enable the service:
sudo systemctl enable vidia-lgc
  1. Restart your computer, and voilà! The workaround is now automatically applied.

Note: I configured the Systemd unit to be run after the nvidia-persistenced service because according to this article https://docs.nvidia.com/deploy/driver-persistence/index.html#usage I understand “nvidia-smi -pm” now relies on this daemon.

Since I applied this workaround, Xid 61 is gone. No more random crash, no more trouble in my work!

Sébastien