Random Xid 61 and Xorg lock-up

2 Days ago I installed driver 440.82. System is much more unstable since then. Can’t go a couple of hours without reboot.

Happens 1-4 times in a 15h period (to me).
2/3 Displays (1 never tested)
Doesn’t matter which (Linux)OS or which applications are started.
Totally Random (sometimes after 30 min, sometimes after 14h).

Has to be x570 chipset, ryzon cpu, RTX 20xx super graphics card, no matter which bios version or graphics card driver…

System freeze, nothing is possible but hard reset. num led takes 30 sek, mouse is moveable but nothing takes. Keyboard (e.g shortcuts doesn’t work). Background services work, if I wait 2h for the hard reset there are system logs, for that time period, ssh works, affects only desktop environment.

occasionally (~1 day in 10) issue doesn’t occur. (same usage)

For reproducible purposes, just take my setup, install ubuntt/gnome and let it sit for 24hours, it will occur.
If not i will provide installed packages list.


Ryzen 9 3900x
GeForce RTX 2070 SUPER
ROG STRIX X570-E GAMING
64 GB Kingston RAM

Ubuntu 20.04
Gnome 3.36.2

BIOS Information:
Vendor: American Megatrends Inc.
Version: 1201
BIOS Revision: 5.14
Release Date: 2019/10/07

I’m also suffering from this with a GTX 1660 Super, for the record (also AMD Ryzen on x570 chispet).
Can’t confirm it’s more unstable with 440.82, no more, no less. As it is pretty random, difficult to say…

Just swapped in my old GTX 1060 to confirm it still stable with the GTX series.

I experienced both “nvlddmkm event 14” on Windows and “Xid 61” on Linux running OpenGL applications in a Dual Boot setup on my new machine. This rendered my PC unusable for a couple of months now.

After reading your hint, I went on and disabled SMT in my BIOS. I consistently haven’t experienced any issues since :)

While this is certainly not a permanent solution it still helps a lot and might provide some insight as to what causes the problem. Obviously Nvidia’s Drivers don’t play well with AMD’s Simultaneous Multithreading.

Same issue here with a 3700X / RTX 2060 SUPER / X570 MB setup with numerous freezes.

Today’s freeze was a little different because the desktop was still sortof functional but very slow. Noticed Chrome at 100%. After killing the process it was X that went to 100%.

Maybe the capture of nvidia-smi is interesting:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 206… Off | 00000000:0A:00.0 On | N/A |
|ERR! 49C P5 ERR! / 175W | 681MiB / 7979MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1350 G /usr/lib/xorg/Xorg 540MiB |
| 0 2407 G /usr/bin/nvidia-settings 23MiB |
| 0 2750 G cinnamon 113MiB |
±----------------------------------------------------------------------------+

I’ve set SMT to disabled now as some people seem to be having luck with that.

Could you try to set MaxPerformanceMode or set your GPU clock to higher frequencies?
For example to minimum 1000MHz, max 2000MHz

nvida-smi -lgc 1000,2000

Does the freeze still occur?

1 Like

I have been running into this issue, but I’ve had success so far by adding this to my xfce startup

nvidia-settings -a “[gpu:0]/GpuPowerMizerMode=1”

I’m now testing out the lock gpu clocks option while letting powermizer go back to default

nvidia-smi -lgc 1000,2145

My system is Ryzen 3900X on a ASRock X570 Taichi (latest bios) with a EVGA GeForce RTX 2060 SUPER running Ubuntu 20.04 and driver 440.64.

2 Likes

@Uli1234 this thread is about an incompatibility between AMD Ryzen 3xxx (Zen2) cpus and Turing gen GPUs. In this case you can even leave the system as is and just swap the cpu for a Ryzen 2xxx (Zen+) making the issue disappear.
Since you’re running an intel platform, I suspect you’re running into a different issue with the same symptoms.
Please open a new thread, run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

My system is still running without issue after several days using

sudo nvidia-smi -lgc 1000,2145

2 Likes

@generix Maybe the issue is not cpu related but has to do something with a power management chip for PCIe on the mainbaords. Since OldToby confirmed that locking the GPU frequencies to a higher idle level works, the issue might be the same.

1 Like

@OldToby Thank you for the feedback! Still no freeze so far? Anybody else can confirm that this works?

This is a serious bug affecting my workflow. Total system freeze daily with xorg running 100% on a single cpu thread together with Xid 61 .(ssh still works, and other bg process runs fine)
My system details:
System: Host: Kernel: 5.6.12-1-MANJARO x86_64 bits: 64 Desktop: i3 4.18.1 Distro: Manjaro Linux
Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-F GAMING v: Rev X.0x serial:
UEFI: American Megatrends v: 1407 date: 04/02/2020
CPU: Topology: 16-Core (2-Die) model: AMD Ryzen 9 3950X bits: 64 type: MT MCP MCM L2 cache: 8192 KiB
Speed: 3014 MHz min/max: 2200/3500 MHz Core speeds (MHz): 1: 3014 2: 2153 3: 2081 4: 3920 5: 2145 6: 2100
7: 3400 8: 2146 9: 2432 10: 2118 11: 2268 12: 1860 13: 2054 14: 2706 15: 2060 16: 2059 17: 2160 18: 2097
19: 2154 20: 2099 21: 3610 22: 2090 23: 2147 24: 1954 25: 2827 26: 2170 27: 2636 28: 1902 29: 1882 30: 1882
31: 2098 32: 2094
Graphics: Device-1: NVIDIA TU104 [GeForce RTX 2080 SUPER] driver: nvidia v: 440.82
Device-2: NVIDIA TU104 [GeForce RTX 2080 SUPER] driver: nvidia v: 440.82
Display: x11 server: X.Org 1.20.8 driver: nvidia resolution: 2560x1440~60Hz

@OldToby Going to try your workaround. Will let you know within 2 days.
Update. Day 1 : No Issues

It seems we’re seeing this combination of X570, Ryzen 3xxx and RTX 2xxx quite a bit. It’s pretty random, sometimes 1-2 weeks can pass without issues and other times it’s every couple days.

One thing to note: after the latest chromium update, I don’t get a full freeze anymore, just a very slow system. Are you using chromium too or another browser?

I tried disabling SMT like some suggested but with no effect. OldToby’s workaround is our hope right now :-)

I’m having an uptime of 5 days no with this workaround. Fingers crossed…

1 Like

So I downgraded to my GTX 1060 card and so far I’ve confirmed the GTX series is more stable (so far 9 days). What is interesting though is that now applications are a bit more unstable. Chrome will freeze up every couple of days. I have also had tmux freeze up on me once. I’m thinking the GTX cards have the same issue, but the driver is able to handle the failure much more gracefully than with the RTX series (ie no xorg freeze up).

@amrits Have you tried your supers yet?

can someone confirm - does running toby’s command persist in settings after a reboot?

Hi jm4games,

I gained access to the MSI X570 system and am currently running the following setup.

Ubuntu 20.04
RTX 2070 Super
NV Driver 440.59
Ryzen 3700x

I’m currently running a few OpenGL demos simultaneously since a week but no luck in recreating issue.
I will try running now compton as per your suggestion and update my test results.

The smi commands get lost after reboot if you just typed them into the console

@amrits:
You can try out to force the GPU to the idle frequency. In my system the issue then appeared within minutes.

sudo nvidia-smi -lgc 300,300

After reboot setting is lost.

@elialbert: In my experience persistence mode turned on or off didn’t matter. I tried both variants. PM mode is useful if you have more than one GPU in your system.

I have observed the problem when running Electron Apps and Firefox. I use Riot, MS teams, VS code, and Unityhub. May be you can try running those and see the issue pops up. And I am also using a compton based compositor(picom)