Regular Xorg freezes on KDE Manjaro

lencho · April 18, 2020, 9:46am

Hi,
I’m a Manjaro KDE user on a new PC with a GTX1660 Super, running latest Nvidia driver.
It works fine, but occasionally, and very unexpectedly, I get a freeze of the graphical interface. I cannot figure out the cause, it happens in very different situation, with the PC being idle or while working on something.
This is not a freeze of the system, just of the display. I can still hear the sound of the film being played for instance, but the image does not refresh. Once I was doing a video transcoding and even though the display was frozen, I left it running and it finished successfully (I had to reboot to check it).
When it happens, at first I can get some reaction, move the mouse a bit, try to close a window. I have a display refresh once every 5-10-20 seconds maybe. But then it get really stuck.
This gave me the time to check the system monitor and every time it’s the same: there are two running processes that are running full throttle each on its core: Xorg and irq/106-nvidia
For the rest, performance report is fine, all other cores are free, there is plenty of free memory.
Only a reboot solves it, until it happens again.

Any help in investigating this would be appreciated!
For the record, a strange thing I can see in the log is the following type of error:

[   280.154] (EE) client bug: timer event2 debounce: scheduled expiry is in the past (-2ms), your system is too slow

Not sure this has to do with what I’m seeing.
Thank you,

generix · April 18, 2020, 2:59pm

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

lencho · April 18, 2020, 3:09pm

Right, I had tried actually, but it wasn’t accepted. Here it is, I removed the .gz extension
nvidia-bug-report.log (217.2 KB)

generix · April 18, 2020, 8:24pm

Unfortunately, no errors logged. Did you run it right after the crash? If not, please wait until it crashes again and run it right after.

lencho · April 20, 2020, 6:44pm

Here is a log not after a crash but after other issues that may be related?
X11 often fails to start after the login screen, its seems like it’s loading but really slowly then I end up on a black screen with just the mouse pointer and need to reboot because just going back to log screen and connecting again gives the same result.
Also, after such things happen, my desktop icons are all over the place, even outside of the monitor. So I can there is a failure in recognizing the resolution or the display in general (but then why would I get the mouse pointer?)
nvidia-bug-report2.log (572.2 KB)

generix · April 20, 2020, 9:57pm

Nothing noteworthy in the logs. Did you already check if just the display connection is flawed, by using a different cable/connector/monitor? Do you use any kind of adapter/converter on it?

gsakhel · April 21, 2020, 7:11pm

Hi, I think I am having a similar issue using Ubuntu 19.10 with Kernel 5.3.0-46-generic on 1080 ti with nvidia-driver-435. Randomly over the last couple weeks, my desktop display freezes and I’m unable to use the mouse or keyboard. SSH works fine and I am able to remotely run the NVIDIA reporting tool as well as poweroff the host. It sometimes happens just after I enter my drive encryption password at boot, sometimes during a video-game, but seems to happen more often when using a browser. In all cases my syslog shows the message:

Apr 21 13:00:43 hostpc kernel: [ 3584.430531] NVRM: GPU at PCI:0000:01:00: GPU-bd7638f6-40d1-2ddd-0a8f-5ffbddd256b6
Apr 21 13:00:43 hostpc kernel: [ 3584.430561] NVRM: GPU Board Serial Number:
Apr 21 13:00:43 hostpc kernel: [ 3584.430566] NVRM: Xid (PCI:0000:01:00): 79, pid=1499, GPU has fallen off the bus.
Apr 21 13:00:43 hostpc kernel: [ 3584.430568] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.

Perhaps a seat-belt might help? :D

Attached are some logs. Two are from days ago and the other 2 are from the crashes that happened when trying to make this post today.

I’ll upgrade the kernel tomorrow, unless there is something else I can run to help diagnose the root cause.

This case seems to have the same problem: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus - #3 by gsakhel
In this case, upgrading to kernal 5.6 seemed to solve the problem: Display freeze at monitor turn on for a few seconds with NVIDIA 440.59 - #7 by mozo

nvidia-bug-report_days_ago.log (2.2 MB) nvidia-bug-report_days_ago_1.log (2.0 MB) nvidia-bug-report_today_0.log (2.0 MB) nvidia-bug-report_today_1.log (2.1 MB)

zeroepoch · April 21, 2020, 7:38pm

I believe I’m seeing a very similar problem as the others with Fedora 31, NVIDIA driver 440.82, and Xorg 1.20.6. My system has a Titan RTX. It just randomly freezes the display about once a week and I have to SSH from another system to do a reboot. The audio from WebEx continues to work and I can move the mouse, but nothing else responds. Here is what appears to be relevant lines from the kernel (from journalctl).

NVRM: GPU at PCI:0000:08:00: GPU-5c7bd6dd-22ca-43c3-871a-ec88ae1cf126
NVRM: GPU Board Serial Number: 0324918077010
NVRM: Xid (PCI:0000:08:00): 61, pid=1586, 0cec(3098) 00000000 00000000
NVRM: Xid (PCI:0000:08:00): 8, pid=1586, Channel 00000018
/usr/libexec/gdm-x-session[1584]: (WW) NVIDIA: Wait for channel idle timed out.
/usr/libexec/gdm-x-session[1584]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00008670, 0x00008678)
GpuWatchdog[2927]: segfault at 0 ip 000055a0e38235b0 sp 00007f0f321a54e0 error 6 in chrome[55a0df4b8000+7347000]
Code: 3d 30 76 fb fa be 01 00 00 00 ba 07 00 00 00 e8 16 06 72 fe 48 8d 3d 18 b4 fc fa be 01 00 00 00 ba 03 00 00 00 e8 00 06 72 fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 56 9e 96 03 01 80 7d 87 00
/usr/libexec/gdm-x-session[1584]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x00008670, 0x00008678)

lencho · April 21, 2020, 8:12pm

To answer the question, nothing special in terms of connection, regular DP-DP cable, I could try another but it’s working fine all the time when the display is started correctly.

wchilders · April 23, 2020, 5:44pm

Hi, I think I too am suffering from this issue.

The symptoms are exact as OP describes. Things start moving slower, and slower, until things are so slow, they’re effectively graphically locked up. Videos and games both seem to be related. I haven’t nailed down a particularly way to make the crash occur. In this case, I had middle clicked in firefox to activate the dragged scroll, and I was on twitter. Perhaps an auto playing video got pulled in by the infinite scroll, or something else happened with firefox’s webrenderer?

The bug largely feels random, sometimes I’ll go a week without seeing it, sometimes I’ll see it multiple times in the same hour – in the latter case, it’s almost always (maybe always) preceded by a video or a game, though not any particular video or game consistently.

I tried mostly recently, and was able to get to a tty (the change between X and the console tty is very slow, but once on the tty things are full speed, including audio). Unfortunately, I wasn’t aware of the nvidia bug report script, so I’ll try and get that next time. I did however, dump dmesg from the tty. This is the relevant interesting portion:

[Apr23 12:44] NVRM: GPU at PCI:0000:0a:00: GPU-2dd471df-2353-145b-1ac7-ddae77f72306
[  +0.000004] NVRM: GPU Board Serial Number: 
[  +0.000004] NVRM: Xid (PCI:0000:0a:00): 61, pid=1221, 0cec(3098) 00000000 00000000
[Apr23 12:45] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ +11.998307] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Apr23 12:46] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[  +8.499499] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[  +5.011144] fbcon: Taking over console
[  +0.000086] Console: switching to colour frame buffer device 128x48
[Apr23 12:49] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Apr23 12:50] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ +12.038590] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[  +8.498130] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ +36.099874] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Apr23 12:51] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[ +36.088768] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[  +8.499709] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[Apr23 12:55] spotify[13605]: segfault at 4 ip 00007f0e6e66ef07 sp 00007fffe94dd328 error 6 in libnvidia-glcore.so.440.82[7f0e6d601000+1814000]
[  +0.000007] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00
[  +8.069868] spotify[15835]: segfault at 4 ip 00007fad2d0d4f07 sp 00007fff90e12e48 error 6 in libnvidia-glcore.so.440.82[7fad2c067000+1814000]
[  +0.000007] Code: 04 01 00 00 44 89 ab 08 01 00 00 44 89 b3 0c 01 00 00 e9 5b ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 8b 44 24 08 83 c2 1a <c7> 46 04 e4 08 04 20 c1 e2 12 89 4e 08 44 89 46 0c 81 ca 00 0e 00

I’m going to break this down a bit. Based on my firefox history, my last google search, (I think right before I went to twitter) was 12:47. So, perhaps whatever this NVRM message at 12:44 is, built up to the major issue over the course of those 3 minutes, or maybe it’s unrelated.

WRT to the tty switch, this is a normal tty switch looks like for me in dmesg:

[Apr23 13:16] fbcon: Taking over console
[  +0.000134] Console: switching to colour frame buffer device 128x48

This “Lost display notification” stuff seems to be abnormal, and thus related – I believe both to the crash and the tty switch.

I also noticed an interesting phenomenon. If I pulled up top, typically a single program would have abnormally high CPU usage in comparison its typical behavior. If I left top, killed it, and came back, another GPU program would take it’s place. If I had to guess, programs are getting stuck in a loop trying to render, either outright failing (thought not crashing), or just moving extremely slowly.

Spotify, is one of the programs that went to the top, and interesting in the dmesg log here, you can see it died of a segfault in libnvidia-glcore.so.440.82. Looking at my fish shell history, I pulled the exact time stamp I killed spotify:

# Thu 23 Apr 2020 12:55:44 PM EDT
kill -9 13605

So, killing the spotify process resulted in this segfault, this again is abnormal, especially the kill to result in a segfault inside of an nvidia library. Without the issue occurring, killing any one of spotify’s processes does not result in this error.

Hardware wise, this is a Ryzen 3950X system with a 2080 RTX card, running on KDE Neon (Ubuntu 18.04 LTS provides the base packages, Neon only provides Qt and KDE packages, so at a core system level, 18.04 LTS) using kernel 5.3.0-46-generic with nvidia 440.82.

Hopefully this is at least somewhat useful.

wchilders · April 29, 2020, 4:58pm

I got a nvidia-bug-report.log.gz this time, and I’ve emailed it to the provided email. Interestingly this time, I was unable to gracefully shutdown my system, otherwise, very similar results, including the Lost display notification.

lencho · April 30, 2020, 9:30pm

So I was about to write I had not had the problem in a week and thought maybe a kernel or driver update fixed it, but then it just happened again!
The freeze left me unable to do anything and I had to hard reboot.
Here is the log made right after the reboot, hope there is something to be seen in there…
Thanks,

nvidia-bug-report3.log (543.0 KB)

amrits · May 5, 2020, 6:51pm

I have filed a bug 200614112 internally for tracking purpose.
Will try to attempt repro and may reach to you again if required more information.

lencho · May 6, 2020, 4:07pm

Thank you,
Was there anything worth noting in the log then? Maybe something I can try in the meantime?

amrits · May 8, 2020, 5:14am

Logs doesn’t have much detailed information to root cause issue.
It would be great to have concrete and reliable repro steps so that I can try the same.

lencho · May 9, 2020, 12:05pm

Hi, unfortunately, I cannot reproduce it myself clearly. I worked for a week on the computer, all day, and had no problems, but then sometimes it happens twice in a few hours.
The only regularity I can mention (although it’s not always the case) is that it’s often when playing a video. It can be from a web browser or a video player like VLC, does not matter. Maybe it was on fullscreen most of the time, I’m not sure.
But I often watch videos (fullscreen or not) and NOT get the freeze, so I’m not sure why sometimes it would cause it.

lencho · May 9, 2020, 8:17pm

Adding to the previous comment, just had a freeze while not watching a video, no doing much.
Log attached, this time the log was taken just as the freeze announced itself: ultra laggy system, Xorg and irq-nvidia processes running on a full core each, as usual. System doing OK for the rest.
nvidia-bug-report4.log (1.0 MB)
Not sure if something about what is going on at that time can be seen in there.

josh.austin · May 13, 2020, 12:47pm

I think some of the reports here referencing Xid 61 might be the same as what we were seeing in this thread:

Random Xid 61 and Xorg lock-up

lencho · May 13, 2020, 3:36pm

Thank you, indeed it seems to be the same issue.
Not really reassuring that it’s been reported by so many people and that it’s been open for months…

amrits · May 25, 2020, 11:40am

I am still trying to recreate issue locally but no luck so far.
Would be great to know if someone finds reliable steps to reproduce it.
Also it would be worth updating BIOS if it is not up to date.

Topic		Replies	Views
Reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100% Linux	24	50971	December 16, 2015
Frequent Freeze/Crash of Xorg with drivers 310.19 with GTS 250 on 3.2.0-4-amd64 Linux	20	15934	June 25, 2013
Display freezes: (EE) NVIDIA(GPU-0): WAIT Linux	25	9054	December 13, 2023
resume from suspend freezes system (GTX 970, Arch Linux, Kernel 4.4/4.7, NVIDIA 370) Linux	171	58214	June 18, 2017
High CPU usage on xorg when the external monitor is plugged in Linux	120	38240	June 21, 2023
X hangs using 100% CPU, WAIT and mieq overflowing errors in logs Linux	67	23568	June 28, 2014
[530.41.03] External monitor stays frozen until I move my mouse Linux	53	7414	November 1, 2024
Display Freeze-Up Problem With nVidia 340.108 Drivers And OpenSUSE Leap 15.1 KDE Plasma 5.12.8 Linux	10	2611	January 4, 2021
Random Xid 61 and Xorg lock-up Linux	406	31746	January 8, 2023
External monitor freezes when using dedicated GPU Linux	263	27247	March 17, 2025

Regular Xorg freezes on KDE Manjaro

Related topics