I am experiencing frequent failure of the kwin compositor since I upgraded my video card from a GTX 670 to a GTX 1070. When the failure occurs the compositor fails to restart so I have to restart kwin to get it back. The failure is always accompanied by an NVRM Xid 31 error. I am able to reliably reproduce the problem using Steam by clicking back and forth between Library and Store rapidly. I have a video of the failure along with the logging here: Kwin 5.10.5 Compositing Failure and Xid 31 - YouTube
I have been unable to reproduce the issue when I enable the options “Force Composition Pipeline” and “Force Full Composition Pipeline” in nvidia-settings. I am also unable to reproduce it if I set the compositor to XRender backend. I am able to reproduce it with both the OpenGL 2.0 and OpenGL 3.1 backends.
I have tried Nvidia driver versions 381.22, 384.59, and 384.69. The behavior is identical.
That solution would be harmful, currently I’m already doing triple buffering with a compositor, so, doing Force Composition Pipeline would impact a negative visual performance and introduce lag. And also disabling any kind of G-Sync.
I can try to test, but I don’t see that as final solution, but a nasty workaround (in case that works)
Of course it’s not a final solution. If the Kwin compositing failures are reducing productivity a work-around like that may be sufficient temporarily. Switching to the Xrender backend may be a better solution for some. It’s also worth testing to verify that the issue is resolved when using that setting as I reported. The real solution is for KDE or Nvidia to acknowledge the problem and fix it.
I must say its something that the driver just screws.
Was playing a game in Wine then Kwin just crashed and went into fallback. I didnt care that much and continued. Then i disabled my second screen in Display settings and system just hanged and got xid 31.
I used the dropdown that took forever to display and it “unfroze” for a period but the second the game got screen space by hiding the dropdown it froze again. Went into another TTY and restarted the loginmanager (SDDM). Everything was fine until i logged in and KWIN started compositing again and it froze.
I had to reboot to get the system to a working state again…
If you are hitting same Xid error with different reproduction steps that means they all are different issues and different root cause. So I good to start separate thread for different issues.
I have also been experiencing a similar issue for two driver releases now. I was on 384.69 (Linux 4.12.13) and upgraded today with a big release on Arch to 384.90 (along with Linux 4.13.3).
I thought it was the same issue, but the Ch and engmask are different (same Xid 31).
Feel free to move the post if this needs to be a separate thread - or let me know that I need to move it.
Steps to Reproduce
Use a program that causes the nvidia-uvm module to autoload.
For me, I can reproduce it 100% when I am watching a video that's using HEVC with hardware decoding
Some games seem to trigger it too. See the log below and the timestamp between the module load and the XID error.
dmesg Output
[Thu Sep 28 14:03:12 2017] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 239
[Thu Sep 28 14:03:19 2017] NVRM: GPU at PCI:0000:01:00: GPU-8ab120f9-d0cf-9b43-2c32-de095bba10c7
[Thu Sep 28 14:03:19 2017] NVRM: GPU Board Serial Number:
[Thu Sep 28 14:03:19 2017] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000043, engmask 00008100, intr 10000000
nvidia-smi -q Output
==============NVSMI LOG==============
Timestamp : Thu Sep 28 14:34:53 2017
Driver Version : 384.90
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-8ab120f9-d0cf-9b43-2c32-de095bba10c7
Minor Number : 0
VBIOS Version : 86.02.39.00.2A
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x36091462
GPU Link Info
PCIe Generation
Max : 3
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 1000 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 29 %
Performance State : P5
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
FB Memory Usage
Total : 11138 MiB
Used : 811 MiB
Free : 10327 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 40 MiB
Free : 216 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 3 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 33 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 18.52 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 696 MHz
SM : 696 MHz
Memory : 810 MHz
Video : 734 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1936 MHz
SM : 1936 MHz
Memory : 5505 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 456
Type : G
Name : /usr/lib/xorg-server/Xorg
Used GPU Memory : 41 MiB
Process ID : 496
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 29 MiB
Process ID : 885
Type : G
Name : /usr/lib/xorg-server/Xorg
Used GPU Memory : 258 MiB
Process ID : 920
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 479 MiB
I have the same issue as Hyper_Eye mentioned. I use KDE as the main desktop, but when launch some applications, like krita, opera, chrome, VS code etc, it triggered the error.
I know how to reproduce this error now. and also I found another interesting point with this defect.
Reproduce steps.
Cold start the system (reboot works too), enter kde through sddm. then launch krita either from krunner or from latte-dock, but not from konsole window. it will freeze the whole desktop. only mouse cursor is movable.
Interesting finding is.
Once the desktop froze, login the system through ssh, and run systemctl restart sddm, then repeat the steps above, the error doesn’t appear.
The dmesg of above 2 steps is below, hope this information can help to find the root cause.
[ 224.056481] NVRM: GPU at PCI:0000:42:00: GPU-f57007b3-0a41-f62d-67dd-8648df008e8a
[ 224.056484] NVRM: GPU Board Serial Number:
[ 224.056486] NVRM: Xid (PCI:0000:42:00): 31, Ch 00000020, engmask 00000101, intr 10000000
[ 245.370736] nvidia-modeset: Freed GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0
[ 246.172482] nvidia-modeset: Allocated GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0
Edit
Just got some time to upgrade my dell xps 8910 with the latest driver 387.34, and the issue happened on this PC too. I couldn’t remember I encountered this issue on Dell XPS before.
The one I just mentioned above is on AMD threadripper with 1080Ti, but the dell has a 1070 card.
The issue is a serious impact to my daily use as I am not sure when it happens and when it is not even I know some of the application will have big chance to trigger this error.
Hope nvidia devs can quickly check with their driver and provide an updated driver.
Is that me and few other users have such issue? I tried with both 387 and 384 driver, also 4.13.x and 4.14.x kernel, the error still happens, but the pattern is very strange, only launch krita can cause this error and in some unpredictable steps. Most of time are happened when launch krita.
Just let me know what else I can help to triage this issue.