Frequent kwin compositing failure with associated Xid 31 error

I am experiencing frequent failure of the kwin compositor since I upgraded my video card from a GTX 670 to a GTX 1070. When the failure occurs the compositor fails to restart so I have to restart kwin to get it back. The failure is always accompanied by an NVRM Xid 31 error. I am able to reliably reproduce the problem using Steam by clicking back and forth between Library and Store rapidly. I have a video of the failure along with the logging here: https://www.youtube.com/watch?v=mW6AM1CauNw

I found a Manjaro user reporting the exact same errors, using the same version of kwin (5.10.5), and the same GPU here: https://forum.manjaro.org/t/stupid-issue-with-plasmashell/30425

I have been unable to reproduce the issue when I enable the options “Force Composition Pipeline” and “Force Full Composition Pipeline” in nvidia-settings. I am also unable to reproduce it if I set the compositor to XRender backend. I am able to reproduce it with both the OpenGL 2.0 and OpenGL 3.1 backends.

I have tried Nvidia driver versions 381.22, 384.59, and 384.69. The behavior is identical.

Here is the corresponding kde bug report: https://bugs.kde.org/show_bug.cgi?id=384403

Here is my nvidia-bug-report.log: http://www.huntsvegas.org/files/nvidia-bug-report.log.gz

Not sure if related but compositing is also dying left and right for me in similar matter. More frequently the last months.

Tried to trigger it the way you did but for me its like opening a new window, or tooltip. Leaving/Entering fullscreen applications or just at random.

Do you get an Xid error in dmesg when compositing dies as well?

I’ve been seeing the same issue the last few driver releases with a GTX 980. I also get the Xid 31 error in dmesg.

I’ll post a bug report when I am home.

Here is my bug report.
nvidia-bug-report.log.gz (329 KB)

I swapped the GTX 670 back in and I was able to reproduce the problem. For that reason I have changed the topic title.

Just in case, for Nvidia devs. I sent a report also about this bug almost 3w ago under issue 200341695

I reproduced this issue using Nvidia 1080 Max-Q, and I sent same nvidia bug information there.

Yup;
[ 7731.124210] NVRM: GPU at PCI:0000:2a:00: GPU-fe359ec7-e7f8-9fe9-e80b-41a3ef593b08
[ 7731.124214] NVRM: Xid (PCI:0000:2a:00): 31, Ch 00000048, engmask 00000101, intr 10000000

Its getting freaking annoying…
nvidia-bug-report.log.gz (281 KB)

Have you tried enabling the options I mentioned in the OP?

Hm missed that, will try. Though then Gsync will stop working -_-’

That solution would be harmful, currently I’m already doing triple buffering with a compositor, so, doing Force Composition Pipeline would impact a negative visual performance and introduce lag. And also disabling any kind of G-Sync.

I can try to test, but I don’t see that as final solution, but a nasty workaround (in case that works)

Of course it’s not a final solution. If the Kwin compositing failures are reducing productivity a work-around like that may be sufficient temporarily. Switching to the Xrender backend may be a better solution for some. It’s also worth testing to verify that the issue is resolved when using that setting as I reported. The real solution is for KDE or Nvidia to acknowledge the problem and fix it.

I must say its something that the driver just screws.

Was playing a game in Wine then Kwin just crashed and went into fallback. I didnt care that much and continued. Then i disabled my second screen in Display settings and system just hanged and got xid 31.
I used the dropdown that took forever to display and it “unfroze” for a period but the second the game got screen space by hiding the dropdown it froze again. Went into another TTY and restarted the loginmanager (SDDM). Everything was fine until i logged in and KWIN started compositing again and it froze.

I had to reboot to get the system to a working state again…

I’m still able to reproduce this bug under 384.90.

You press alt+space, and run “kwin_x11 --replace”. That is some bad workaround for that. It will mess your activities, but it works fine.

If you are hitting same Xid error with different reproduction steps that means they all are different issues and different root cause. So I good to start separate thread for different issues.

I have also been experiencing a similar issue for two driver releases now. I was on 384.69 (Linux 4.12.13) and upgraded today with a big release on Arch to 384.90 (along with Linux 4.13.3).

I thought it was the same issue, but the Ch and engmask are different (same Xid 31).

Feel free to move the post if this needs to be a separate thread - or let me know that I need to move it.

Steps to Reproduce

  1. Use a program that causes the nvidia-uvm module to autoload.
  2. For me, I can reproduce it 100% when I am watching a video that's using HEVC with hardware decoding
  3. Some games seem to trigger it too. See the log below and the timestamp between the module load and the XID error.

dmesg Output

[Thu Sep 28 14:03:12 2017] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 239
[Thu Sep 28 14:03:19 2017] NVRM: GPU at PCI:0000:01:00: GPU-8ab120f9-d0cf-9b43-2c32-de095bba10c7
[Thu Sep 28 14:03:19 2017] NVRM: GPU Board Serial Number: 
[Thu Sep 28 14:03:19 2017] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000043, engmask 00008100, intr 10000000

nvidia-smi -q Output

==============NVSMI LOG==============

Timestamp                           : Thu Sep 28 14:34:53 2017
Driver Version                      : 384.90

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Product Name                    : GeForce GTX 1080 Ti
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-8ab120f9-d0cf-9b43-2c32-de095bba10c7
    Minor Number                    : 0
    VBIOS Version                   : 86.02.39.00.2A
    MultiGPU Board                  : No
    Board ID                        : 0x100
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1B0610DE
        Bus Id                      : 00000000:01:00.0
        Sub System Id               : 0x36091462
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 2
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : 1000 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 29 %
    Performance State               : P5
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
    FB Memory Usage
        Total                       : 11138 MiB
        Used                        : 811 MiB
        Free                        : 10327 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 40 MiB
        Free                        : 216 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 3 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 33 C
        GPU Shutdown Temp           : 96 C
        GPU Slowdown Temp           : 93 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 18.52 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 300.00 W
    Clocks
        Graphics                    : 696 MHz
        SM                          : 696 MHz
        Memory                      : 810 MHz
        Video                       : 734 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1936 MHz
        SM                          : 1936 MHz
        Memory                      : 5505 MHz
        Video                       : 1708 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 456
            Type                    : G
            Name                    : /usr/lib/xorg-server/Xorg
            Used GPU Memory         : 41 MiB
        Process ID                  : 496
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 29 MiB
        Process ID                  : 885
            Type                    : G
            Name                    : /usr/lib/xorg-server/Xorg
            Used GPU Memory         : 258 MiB
        Process ID                  : 920
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 479 MiB

inxi output

System:    Host: pc02 Kernel: 4.13.3-1-ARCH x86_64 Distro: Arch Linux
Machine:   Device: desktop Mobo: ASRock model: Z270 Gaming K6
           UEFI: American Megatrends v: P2.10 date: 05/05/2017
CPU:       Quad core Intel Core i7-7700K (-HT-MCP-) cache: 8192 KB
           clock speeds: max: 4600 MHz 1: 4200 MHz 2: 4200 MHz 3: 4200 MHz 4: 4200 MHz 5: 4200 MHz 6: 4200 MHz
           7: 4200 MHz 8: 4200 MHz
Memory:    Array-1 capacity: 64 GB devices: 4 EC: None
           Device-1: ChannelA-DIMM0 size: 8 GB speed: 2133 MT/s type: DDR4
           Device-2: ChannelA-DIMM1 size: 16 GB speed: 2133 MT/s type: DDR4
           Device-3: ChannelB-DIMM0 size: 8 GB speed: 2133 MT/s type: DDR4
           Device-4: ChannelB-DIMM1 size: 16 GB speed: 2133 MT/s type: DDR4
Graphics:  Card: NVIDIA GP102 [GeForce GTX 1080 Ti]
           Display Server: X.Org 1.19.3 driver: nvidia Resolution: 3840x2160@60.00hz
           OpenGL: renderer: GeForce GTX 1080 Ti/PCIe/SSE2 version: 4.5.0 NVIDIA 384.90
Audio:     Card-1 Intel 200 Series PCH HD Audio driver: snd_hda_intel Sound: ALSA v: k4.13.3-1-ARCH
           Card-2 NVIDIA GP102 HDMI Audio Controller driver: snd_hda_intel
Network:   Card-1: Intel Ethernet Connection (2) I219-V driver: e1000e
           IF: enp0s31f6 state: up speed: 1000 Mbps duplex: full
           Card-2: Intel I211 Gigabit Network Connection driver: igb
           IF: enp112s0 state: down
           Card-3: Realtek RTL8812AE 802.11ac PCIe Wireless Network Adapter driver: rtl8821ae
           IF: wlp116s0 state: down
Drives:    HDD Total Size: 1256.3GB (27.9% used)
           ID-1: /dev/nvme0n1 model: THNSN5256GPU7_TOSHIBA size: 256.1GB
           ID-2: /dev/nvme1n1 model: TOSHIBA size: 512.1GB
           ID-3: /dev/sda model: WDC_WDBNCE0010PN size: 1000.2GB
Partition: ID-1: / size: 234G used: 90G (41%) fs: ext4 dev: /dev/nvme0n1p2
           ID-2: /boot size: 500M used: 72M (15%) fs: vfat dev: /dev/nvme0n1p1
Sensors:   System Temperatures: cpu: 28.0C mobo: 31.0C gpu: 43C
           Fan Speeds (in rpm): cpu: N/A fan-1: 1147 fan-2: 1846 fan-3: 0 fan-4: 424 fan-5: 0
Info:      Processes: 231 Uptime: 51 min Memory: 2317.7/48328.7MB Client: Shell (sudo) inxi: 2.3.40

Let me know if you need any further information. Thanks -

Please see:
http://docs.nvidia.com/deploy/xid-errors/index.html#topic_2
http://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_2

I have the same issue as Hyper_Eye mentioned. I use KDE as the main desktop, but when launch some applications, like krita, opera, chrome, VS code etc, it triggered the error.

Repost it here as I post it on https://devtalk.nvidia.com/default/topic/1025701/linux/gtx-970-with-kde-kwin-nvrm-xid-pci-0000-01-00-31-ch-00000028-engmask-0000-/

I know how to reproduce this error now. and also I found another interesting point with this defect.

Reproduce steps.
Cold start the system (reboot works too), enter kde through sddm. then launch krita either from krunner or from latte-dock, but not from konsole window. it will freeze the whole desktop. only mouse cursor is movable.

Interesting finding is.
Once the desktop froze, login the system through ssh, and run systemctl restart sddm, then repeat the steps above, the error doesn’t appear.

The dmesg of above 2 steps is below, hope this information can help to find the root cause.
[ 224.056481] NVRM: GPU at PCI:0000:42:00: GPU-f57007b3-0a41-f62d-67dd-8648df008e8a
[ 224.056484] NVRM: GPU Board Serial Number:
[ 224.056486] NVRM: Xid (PCI:0000:42:00): 31, Ch 00000020, engmask 00000101, intr 10000000
[ 245.370736] nvidia-modeset: Freed GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0
[ 246.172482] nvidia-modeset: Allocated GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0


Edit


Just got some time to upgrade my dell xps 8910 with the latest driver 387.34, and the issue happened on this PC too. I couldn’t remember I encountered this issue on Dell XPS before.

The one I just mentioned above is on AMD threadripper with 1080Ti, but the dell has a 1070 card.

The issue is a serious impact to my daily use as I am not sure when it happens and when it is not even I know some of the application will have big chance to trigger this error.

Hope nvidia devs can quickly check with their driver and provide an updated driver.

Is that me and few other users have such issue? I tried with both 387 and 384 driver, also 4.13.x and 4.14.x kernel, the error still happens, but the pattern is very strange, only launch krita can cause this error and in some unpredictable steps. Most of time are happened when launch krita.

Just let me know what else I can help to triage this issue.

Thanks
nvidia-bug-report.log.gz (134 KB)