Hey,
I was trying to solve this by myself, but it’s over 2 months now and I’m out of ideas.
There are so many things I’ve tried - I completely lost count, but for the sake of providing logs and further detail, I’m more than happy to test anything again.
My Problem is, that after a couple of minutes playing certain games, my PC freezes. There is no way to switch to terminal or TTY, a hard reset is required to recover.
The most notable game is Warframe (proton). After only 5 minutes or so, the crash happens.
In EvE Online (proton), my monitor sometimes turns grey, no GUI, nothing - just grey.
I have tried booting into my old windows install and tested benchmarks and gaming, but everything was working fine.
I tried to ssh into my machine and started playing, while having journalctl -f running.
At the time of freeze, I received:
Nov 02 02:58:50 Ceetemus kernel: NVRM: GPU at PCI:0000:01:00: GPU-27f23ee2-fdae-0271-e491-038e6975f972
Nov 02 02:58:50 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000063
and…
Nov 02 03:00:22 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Nov 02 03:00:22 Ceetemus kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Nov 02 03:00:22 Ceetemus kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
I run the nvidia-bug-report.sh at that time. I will attach it.
I went ahead and searched for similar reports, I wanted to know if that was the cause of my frequent crashes or a one-time thing:
Sep 13 21:26:54 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 001b, Class 0000b197, Offset 000007e4, Data a0040eaa, ErrorCode 0000000c
Sep 13 21:32:55 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 001b, Class 0000b197, Offset 000007a4, Data 2004c004, ErrorCode 0000000c
Sep 13 21:42:01 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0033, Class 0000b197, Offset 000007e4, Data a0040eaa, ErrorCode 0000000c
Sep 14 00:39:24 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 003b, Class 0000b197, Offset 000007e4, Data a0040eaa, ErrorCode 0000000c
Sep 15 17:04:11 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Sep 21 02:44:46 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000043
Sep 27 23:12:27 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Class 0x3d8 Subchannel 0x0 Mismatch
Sep 27 23:12:27 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x4041b0=0x3d8
Sep 27 23:12:27 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x404000=0x80000002
Sep 27 23:12:27 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 003b, Class 0000b197, Offset 00001a2c, Data 00000000
Sep 27 23:12:27 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000003b intr 02000000
Sep 27 23:18:15 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 41, CCMDs 0000003b 0000b0b5
Sep 27 23:18:56 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000003b intr 00800000
Sep 27 23:18:56 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 32, Channel ID 0000003b intr 00800000
Sep 30 22:44:32 Ceetemus kernel: NVRM: Xid (PCI:0000:02:00): 31, Ch 00000044, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_4 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
Okt 01 21:28:57 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00063192
Okt 01 21:29:05 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00063193
Okt 01 21:29:13 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00063194
Okt 01 21:29:21 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00063195
Okt 20 00:34:54 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 0000001b
Okt 21 00:14:26 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000053
Okt 31 19:18:10 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Okt 31 20:44:33 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 000010ea
Okt 31 22:19:01 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, pid=353, Channel 00000053
Okt 31 22:40:51 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 0000004b
Okt 31 23:19:24 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Nov 01 00:29:36 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 31, Ch 00000053, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_T1_5 faulted @ 0xff_8836a000. Fault is of type FAULT_INFO_TYPE_UNSUPPORTED_KIND ACCESS_TYPE_READ
Nov 01 00:57:35 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000053
Nov 02 02:58:50 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000063
Nov 02 03:00:22 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
So we are seeing
Xid 8, 13, 16, 31, 32, 41, 69, 79
According to https://docs.nvidia.com/deploy/pdf/XID_Errors.pdf
All these errors have “Driver issue” in common.
Here are a few lines before and after todays crash:
Nov 02 02:58:22 Ceetemus org_kde_powerdevil[959]: powerdevil: Can't contact ck
Nov 02 02:58:47 Ceetemus org_kde_powerdevil[959]: powerdevil: Releasing inhibition with cookie 2007
Nov 02 02:58:47 Ceetemus org_kde_powerdevil[959]: powerdevil: Restoring DPMS features after inhibition release
Nov 02 02:58:47 Ceetemus org_kde_powerdevil[959]: powerdevil: Scheduling inhibition from ":1.15" "My SDL application" with cookie 2008 and reason "Playing a game"
Nov 02 02:58:47 Ceetemus org_kde_powerdevil[959]: powerdevil: Can't contact ck
Nov 02 02:58:50 Ceetemus kernel: NVRM: GPU at PCI:0000:01:00: GPU-27f23ee2-fdae-0271-e491-038e6975f972
Nov 02 02:58:50 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 8, Channel 00000063
Nov 02 02:58:52 Ceetemus org_kde_powerdevil[959]: powerdevil: Enforcing inhibition from ":1.15" "My SDL application" with cookie 2008 and reason "Playing a game"
Nov 02 02:58:52 Ceetemus org_kde_powerdevil[959]: powerdevil: Added change screen settings
Nov 02 02:58:52 Ceetemus org_kde_powerdevil[959]: powerdevil: Added interrupt session
Nov 02 02:58:52 Ceetemus org_kde_powerdevil[959]: powerdevil: Disabling DPMS due to inhibition
Nov 02 02:58:52 Ceetemus org_kde_powerdevil[959]: powerdevil: Can't contact ck
q
Nov 02 03:00:01 Ceetemus CROND[31312]: (root) CMD (timeshift --check --scripted)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT ((process:31312): GLib-GIO-CRITICAL **: 03:00:01.172: g_file_get_path: assertion 'G_IS_FILE (file)' failed)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT ()
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (** (process:31312): CRITICAL **: 03:00:01.172: tee_jee_file_system_path_combine: assertion 'path1 != NULL' failed)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT ()
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (** (process:31312): CRITICAL **: 03:00:01.172: tee_jee_file_system_dir_exists: assertion 'dir_path != NULL' failed)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (Daily snapshots are enabled)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (Last daily snapshot is 6 hours old)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (Monthly snapshot are enabled)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (Last monthly snapshot is 28 days old)
Nov 02 03:00:01 Ceetemus CROND[31311]: (root) CMDOUT (------------------------------------------------------------------------------)
Nov 02 03:00:01 Ceetemus crontab[31344]: (root) LIST (root)
Nov 02 03:00:01 Ceetemus crontab[31345]: (root) LIST (root)
Nov 02 03:00:22 Ceetemus kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
Nov 02 03:00:22 Ceetemus kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Nov 02 03:00:22 Ceetemus kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Nov 02 03:00:22 Ceetemus org_kde_powerdevil[959]: powerdevil: Releasing inhibition with cookie 2008
Nov 02 03:00:22 Ceetemus org_kde_powerdevil[959]: powerdevil: Restoring DPMS features after inhibition release
Nov 02 03:00:22 Ceetemus org_kde_powerdevil[959]: powerdevil: Can't contact ck
Nov 02 03:00:35 Ceetemus kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Nov 02 03:00:35 Ceetemus kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
Nov 02 03:00:35 Ceetemus kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
^C
I already tried other drivers before - including beta and older ones. As I said this has been going on for 2 months now.
I’ve also tried other distros. POP!OS, ManjaroXFCE. Same issues
Hardware is fine, everything runs great on windows.
What do I do?
My System is up to date.
Thank you for your time
-CT
nvidia-bug-report.log.gz (61.3 KB)