Frequent compositor crashes with 384+

Since 384 came out, compositor crashes are almost everyday thing (30+% rate) when starting GLX applications. In my case, starting Steam and/or a game. Although, I sometimes get a crash with a plain $ glxinfo.

My distro: kubuntu 17.10
kwin 5.10.5

I had 387.XX with kubuntu 17.04, which had way fewer crashes (<10% rate?), but was not stable either.

Since the upgrade, I moved back to “stable” 384.90 (default for 17.10), so reporting problems here as crashes are unbearable (need to logout/in to save my apps statuses, kwin restart forgets things => not an option).

Three logged cases:
https://drive.google.com/open?id=0BzvClrlNf417QXBidUpWRFBZQXc
https://drive.google.com/open?id=0BzvClrlNf417WEMzd1JXZWR0TmM
https://drive.google.com/open?id=0BzvClrlNf417d1hiUnRQLXBnd28

Let me know if something else is needed.

This could be distantly related to:
https://devtalk.nvidia.com/default/topic/1025477/linux/x-org-crashes-on-ubuntu-17-10-with-driver-nvidia-384-after-upgrade/
https://devtalk.nvidia.com/default/topic/1023193/linux/384-69-broke-kde-screen-locker-possibly-other-qt-based-software-on-linux/?offset=7

All your logs exibit a XID 31 GPU error, which means ‘GPU memory page fault’. Looks more like a hardware error to me.
Yet unrelated, but some USB device is not working properly and flooding the logs.

Yeah, usb-c controller is not behaving well on my motherboard (I suspect a custom MSI chip being nasty): please ignore the usb flood.

Is there a way to verify a suspicion?

One more logged case with a plain glxinfo causing the crash:

I would start by using a plain X session to see if this depends on kwin. Then maybe do some system memory and video memory tests.
Downgrading to the 375 driver could be possible, too.
The USB message flood is just a bit hampering debugging since it pushes out early system/driver init messages besides having to scroll a lot.

Is there a maintained vram test utility out there?
Found one: https://github.com/ihaque/memtestG80
but seems to be ill maintained and fails to compile.

In the meantime, I have rolled to 387 as 384 is impossible, crashes even with a chat app now…

vmem tests
https://sourceforge.net/projects/cudagpumemtest/
http://mikelab.kiev.ua/index_en.php?page=PROGRAMS/vmt_en

In the end, I ran memtestG80 (from https://github.com/ihaque/memtestG80 ) after fixing linking problems.

No errors were observed.

Tried:
5632 Mb 150 times (tty session)
4608 Mb 50 times
4096 Mb 100 times

So far 387 driver runs fine*.

  • My past experience was that 387 runs much more stable.

OK, crashes like hell with 387 and Steam. Looks like that there is an older thread on this already:
https://devtalk.nvidia.com/default/topic/1023621/linux/frequent-kwin-compositing-failure-with-associated-xid-31-error/1

Will try the only solution for the moment: Force Composition Pipeline

Hi,

i am also facing the same issue. Applications crashing the KDE/Kwin, having following entries in dmesg:

[ 5749.006413] NVRM: GPU at PCI:0000:01:00: GPU-ff4d3e99-dd28-f0db-e329-6d56d3f2d05b
[ 5749.006416] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000028, engmask 00000101, intr 10000000

I also get the message “the compositore crashed and needs to be restarted”, but it usually doesn’t recover.

I opened a thread for it:

https://devtalk.nvidia.com/default/topic/1025701/linux/gtx-970-on-opensuse-tumbleweed-nvrm-xid-pci-0000-01-00-31-ch-00000028-engmask-0000-/

The only thing which i notice is that most of us having the issue are using KDE.
The KDE devs blame then nvidia driver for causing kwin issues.

Can any nvidia dev jump in here cause many of us are having issues right now ?

Many thanks !
Christian