GTX 970 with KDE/KWIN :NVRM: Xid (PCI:0000:01:00): 31, Ch 00000028, engmask 0000...

HI,

sadly my GTX970 is unstabel lately in 2d usage. Didn’t notice any issues during e.g 3d games.

My kernel:

gamebox:~ # uname -a
Linux gamebox 4.13.8-1-default #1 SMP PREEMPT Wed Oct 18 09:53:30 UTC 2017 (569e26e) x86_64 x86_64 x86_64 GNU/Linux

I just had another crash while starting chrome:

[ 5749.006413] NVRM: GPU at PCI:0000:01:00: GPU-ff4d3e99-dd28-f0db-e329-6d56d3f2d05b
[ 5749.006416] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000028, engmask 00000101, intr 10000000
[ 5804.205680] python[4261]: segfault at 8 ip 00007fa6625814ac sp 00007ffea8f600a0 error 4 in libQt5XcbQpa.so.5.9.2[7fa662546000+f9000]

nvidia bug report is attached.

Any ideas ?

Many thanks !
Christian
nvidia-bug-report.log.gz (272 KB)

Hi,
i can reproduce the issue by simply starting chrome:

[ 1588.473869] NVRM: GPU at PCI:0000:01:00: GPU-ff4d3e99-dd28-f0db-e329-6d56d3f2d05b
[ 1588.473872] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000028, engmask 00000111, intr 10000000

[ 1660.261547] NVRM: GPU at PCI:0000:01:00: GPU-ff4d3e99-dd28-f0db-e329-6d56d3f2d05b
[ 1660.261550] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000028, engmask 00000101, intr 10000000

Any ideas ?

thx
christian

Does downgrading to 375-drivers help?

Hi,

can’t try that out easily, im using the suse tw packages and onyl the newest version is availble there.

As the troubles started recently, i guess it is either caused by changes in the driver or in kde.

I managed to crash kde reproducable using steam and trying to start serious sam fusion. The XID happend every time there.

Today i gave it a try using xfce instead of kde and it worked fine.

Reading through other posts here which, for me it looks like that kde/kwin has major issues atm with the proprietary nvidia drivers :(, so either bugs in the driver or kwin/kde code…

Hope that gets fixed asap.

Cu,
Christian

Yes, looks like nvidia and kwin still can’t agree on how this should work:
[url]KDE compositing crash on NVIDIA drivers 384.90+ - Linux - NVIDIA Developer Forums

There is an article that summarizes the situation quite accurately:

Thats for wayland bro… You will still have time for buying an AMD Vega 56/64 or wait for next AMD release that hopefully will be in line price/performance and opensource drivers and working Wayland + XWayland stack with it.

Any news from nvidia on that topic ? Seems i am not the only one facing issues with KWIN/KDE and the nvidia driver…

Yeah nothing… I found out that switching to Xrender in Kwin mitigates the XID errors and Random OpenGL Compositing crashes.

I have been running with
nvidia-settings --assign
CurrentMetaMode=“DVI-D-0: nvidia-auto-select +0+0, DP-0: nvidia-auto-select +1920+0 { ForceCompositionPipeline = On }”
(for a two monitor case), which somewhat mitigates the problem, Xid 31 are gone, but I get new type of crashes then:
NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.

these are way more significant and totally hang the gpu, system reboots are needed.

I have the same issue with my 1080Ti card. chrome, opera, VScode can also cause the problem when they are launching. Krita sometimes has the same issue with some simple drawing work.

I don’t care the relationship between nvidia and KDE, if nvidia doesn’t want to provide support to wayland as the community needs, it is ok. But nvidia should focus on their drivers to ensure the opengl stuff works without issue.

There are many users use nvidia cards on linux. Nvidia should put resources on this area to satisfy their customer needs.

There is a listing on XID error codes somewhere in the manual, bunch of these are caused by hardware issues (PCIE slot problems, BIOS related issues, RAM issues, unstable overclocks etc). Go and look them up.

http://docs.nvidia.com/deploy/xid-errors/index.html#topic_2

I think by now people are very much aware of the XID codes.
I personally start to develop a gut feeling that this could be related to a VRAM regression since 378 drivers.

In any case, I happen to have a laptop with “identical” GPU 1060 and the problem is not there. Although the setup is basically the same. I upgraded to KDE 5.11 and ran into a well-known black-screen issue:

which is a similar (same?) beast we’re dealing with here but @5.11.

I know how to reproduce this error now. and also I found another interesting point with this defect.

Reproduce steps.
Cold start the system (reboot works too), enter kde through sddm. then launch krita either from krunner or from latte-dock, but not from konsole window. it will freeze the whole desktop. only mouse cursor is movable.

Interesting finding is.
Once the desktop froze, login the system through ssh, and run systemctl restart sddm, then repeat the steps above, the error doesn’t appear.

The dmesg of above 2 steps is below, hope this information can help to find the root cause.
[ 224.056481] NVRM: GPU at PCI:0000:42:00: GPU-f57007b3-0a41-f62d-67dd-8648df008e8a
[ 224.056484] NVRM: GPU Board Serial Number:
[ 224.056486] NVRM: Xid (PCI:0000:42:00): 31, Ch 00000020, engmask 00000101, intr 10000000
[ 245.370736] nvidia-modeset: Freed GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0
[ 246.172482] nvidia-modeset: Allocated GPU:0 (GPU-f57007b3-0a41-f62d-67dd-8648df008e8a) @ PCI:0000:42:00.0


Edit


Just got some time to upgrade my dell xps 8910 with the latest driver 387.34, and the issue happened on this PC too. I couldn’t remember I encountered this issue on Dell XPS before.

The one I just mentioned above is on AMD threadripper with 1080Ti, but the dell has a 1070 card.

The issue is a serious impact to my daily use as I am not sure when it happens and when it is not even I know some of the application will have big chance to trigger this error.

Hope nvidia devs can quickly check with their driver and provide an updated driver.

For me Krita 3.3.2 does not do the job. Tried on my both systems. Which one are you using?

I don’t think they are, every time somebody posts an issue containing string “XID”,
other people with completely different XID codes and behavior post “me too”.

@christian_frank:

Bunch of people with threadripper/ryzen systems seems to be running into these lately, bunch of them report their problems disappearing after replacing memory/changing configuration.
There was a similar post not too long time ago with XID 31s and poster said that the problem went away this way, just search the forum.

I remember seeing a lot of Xid 31 errors in my syslog when I was experimenting with something a few months back.

I suspect it may have something to do with this one:

https://devtalk.nvidia.com/default/topic/1026874/linux/huge-performance-losses-with-newer-nvidia-drivers

And you are completely wrong. The only reported behavior that bug causes is increased memory usage which is problem with some demanding games. It has nothing to do with Xid errors.
Exactly the kind of “me too” response I’ve been bitching about in post above.

I tried on both my systems.

  1. AMD Threadripper 1950x + gigabyte designare ex motherboard and gigabyte auros xtream 1080ti card.
  2. dell XPS 8910 with 6th gen Intel CPU (6700) and 1070 video card.

both of my systems are gentoo, but on dell the profile is 13.0 and on AMD TR4 the profile is 17.0 but both system are built with gcc 7.2

and which version of Krita?