CPU: AMD Ryzen 7 3700X 8-Core Processor
GPU: GeForce RTX 2070 SUPER
Driver: nvidia 430.40
Kernel: 5.2.7-arch1-1-ARCH
DE: Xfce4 + xfwm4
This is a newly built machine that works stably under Windows 10 (hours of 3D gaming) so I think HW is fine. The card performs normally under Linux as well, until the issue happens.
What: Before it happens everything behaves normally and I can get a decent 3D performance. Suddenly it enters a state that the Xorg could easily hang. Initially (in that state) Iām still able to move my mouse cursor, but if I, for example, move a window around, or switch a window in background to foreground, or scroll my text editor, i.e. have any action requiring window update, the entire desktop freezes and I cannot move the mouse cursor any more and Xorg takes 100% of CPU. The hang lasts for few seconds to minutes (the duration varies based on how much update is going on, i.e. moving bigger windows freezes X longer than smaller windows). After Xorg CPU drops to 0 it sort of ārecoversā until I touch any window again (it freezes again).
When: Thereās no way to tell if something particular that triggers this state. It could happen at any time, ranging from just sitting idle to when Iām actively using it. In order to reproduce the issue and get the attached log, I have kept the machine running for days and used it as usual. This morning it finally hanged without any sign. I ssh-ed into my machine from my phone and took the log.
Iāve debugged a little bit. I found what whenever it enters the abnormal state, my dmesg always has the following lines:
[28736.200395] NVRM: GPU at PCI:0000:07:00: GPU-06a0a514-1651-491d-717c-2e1e24b93c99
[28736.200398] NVRM: GPU Board Serial Number:
[28736.200399] NVRM: Xid (PCI:0000:07:00): 61, 0cb5(2d50) 00000000 00000000
Iāve searched in the docs and thereās barely a detailed discussion on the Xid 61.
In addition Iāve managed to get a strace on Xorg when everything got stuck (Xorg is at 100%). I can see when Xorg is busy, a swarm of SIGALRM (thousands) are sent by the kernel, at a rate of every 5000 us or so. I donāt see any of these when itās not hanging. I hope this information is useful and gives a clue.
714 07:07:26.348280 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.348304 rt_sigreturn({mask=[]}) = 0
714 07:07:26.352926 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.352949 rt_sigreturn({mask=[]}) = 1
714 07:07:26.358292 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.358315 rt_sigreturn({mask=[]}) = 0
714 07:07:26.363284 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.363303 rt_sigreturn({mask=[]}) = 1
714 07:07:26.367927 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.367951 rt_sigreturn({mask=[]}) = 1
714 07:07:26.373287 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.373310 rt_sigreturn({mask=[]}) = 0
714 07:07:26.378282 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.378302 rt_sigreturn({mask=[]}) = 1
714 07:07:26.383348 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.383368 rt_sigreturn({mask=[]}) = 1
714 07:07:26.388290 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.388310 rt_sigreturn({mask=[]}) = 1
714 07:07:26.392923 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.392942 rt_sigreturn({mask=[]}) = 0
714 07:07:26.398271 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.398291 rt_sigreturn({mask=[]}) = 1
714 07:07:26.402926 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.402949 rt_sigreturn({mask=[]}) = 54088
714 07:07:26.408282 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.408302 rt_sigreturn({mask=[]}) = 0
714 07:07:26.412926 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.412945 rt_sigreturn({mask=[]}) = 13522
714 07:07:26.417927 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.417947 rt_sigreturn({mask=[]}) = 13522
714 07:07:26.422925 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.422945 rt_sigreturn({mask=[]}) = 0
714 07:07:26.428274 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.428294 rt_sigreturn({mask=[]}) = 1
714 07:07:26.432926 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.432948 rt_sigreturn({mask=[]}) = 0
714 07:07:26.438276 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.438295 rt_sigreturn({mask=[]}) = 1
714 07:07:26.442926 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.442945 rt_sigreturn({mask=[]}) = 0
714 07:07:26.448280 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
714 07:07:26.448300 rt_sigreturn({mask=[]}) = 0
Any help is appreciated.
nvidia-bug-report.log.gz (875 KB)
Xorg-strace.gz (22.9 KB)