My Laptop 4070 has been consistently crashing in dGPU mode. The timing is pretty much random, but the crashes itself happen in dGPU regardless. The usual message I get is:
May 11 19:03:42 AORUS kernel: NVRM: GPU at PCI:0000:01:00: GPU-7ed6528c-c5f0-aac3-74a2-e39d9386b2e6
May 11 19:03:42 AORUS kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
May 11 19:03:42 AORUS kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
May 11 19:03:42 AORUS kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
May 11 19:03:42 AORUS kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
May 11 19:03:46 AORUS kernel: NVRM: Error in service of callback
May 11 19:03:47 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:03:52 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:03:57 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:04:02 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:04:07 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:04:12 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:04:17 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
But recently I got this one a couple times:
May 11 19:34:24 AORUS kernel: NVRM: GPU at PCI:0000:01:00: GPU-7ed6528c-c5f0-aac3-74a2-e39d9386b2e6
May 11 19:34:24 AORUS kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
May 11 19:34:24 AORUS kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
May 11 19:34:24 AORUS kernel: NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000.
May 11 19:34:24 AORUS kernel: NVRM: GPU0 RPC history (CPU -> GSP):
May 11 19:34:24 AORUS kernel: NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
May 11 19:34:24 AORUS kernel: NVRM: 0 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9c43d59c 0x0000000000000000 y
May 11 19:34:24 AORUS kernel: NVRM: -1 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9c30bb21 0x000634df9c30bf5f 1086us
May 11 19:34:24 AORUS kernel: NVRM: -2 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9c1d9f96 0x000634df9c1da378 994us
May 11 19:34:24 AORUS kernel: NVRM: -3 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000634df9c0d473b 0x000634df9c0d48c0 389us
May 11 19:34:24 AORUS kernel: NVRM: -4 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9c0a8641 0x000634df9c0a8836 501us
May 11 19:34:24 AORUS kernel: NVRM: -5 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9bf76dc9 0x000634df9bf76efc 307us
May 11 19:34:24 AORUS kernel: NVRM: -6 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9be4505c 0x000634df9be45652 1526us
May 11 19:34:24 AORUS kernel: NVRM: -7 76 GSP_RM_CONTROL 0x000000002080a0d1 0x00000000000007e8 0x000634df9bd1353e 0x000634df9bd1391d 991us
May 11 19:34:24 AORUS kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
May 11 19:34:24 AORUS kernel: NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
May 11 19:34:24 AORUS kernel: NVRM: 0 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3b6068 0x000634df9c3b6068
May 11 19:34:24 AORUS kernel: NVRM: -1 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3b6067 0x000634df9c3b6068 1us
May 11 19:34:24 AORUS kernel: NVRM: -2 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3b3c1e 0x000634df9c3b3c1f 1us
May 11 19:34:24 AORUS kernel: NVRM: -3 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3b3c1d 0x000634df9c3b3c1e 1us
May 11 19:34:24 AORUS kernel: NVRM: -4 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3ae58a 0x000634df9c3ae58a
May 11 19:34:24 AORUS kernel: NVRM: -5 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3ae588 0x000634df9c3ae58a 2us
May 11 19:34:24 AORUS kernel: NVRM: -6 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3aa9e1 0x000634df9c3aa9e2 1us
May 11 19:34:24 AORUS kernel: NVRM: -7 4099 POST_EVENT 0x0000000000000000 0x0000000000000000 0x000634df9c3aa9e0 0x000634df9c3aa9e1 1us
May 11 19:34:24 AORUS kernel: CPU: 16 UID: 1000 PID: 8394 Comm: [vkps] Update Tainted: P O 6.14.6-1-liquorix-amd64 #1 liquorix 6.14-8ubuntu1~noble
May 11 19:34:24 AORUS kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
May 11 19:34:24 AORUS kernel: Hardware name: GIGABYTE AORUS 16X ASG/AORUS 16X ASG, BIOS F9 06/25/2024
May 11 19:34:24 AORUS kernel: Call Trace:
May 11 19:34:24 AORUS kernel: <TASK>
May 11 19:34:24 AORUS kernel: dump_stack_lvl+0x60/0x80
May 11 19:34:24 AORUS kernel: _nv013207rm+0x2c5/0x5b0 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv013118rm+0x77/0x330 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv051942rm+0x49f/0x7f0 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv000753rm+0x173/0x320 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv000724rm+0x1a0/0x1a0 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv013401rm+0x3d/0xa0 [nvidia]
May 11 19:34:24 AORUS kernel: ? _nv000778rm+0x8d2/0xe00 [nvidia]
May 11 19:34:24 AORUS kernel: ? rm_ioctl+0x7f/0x400 [nvidia]
May 11 19:34:24 AORUS kernel: ? nvidia_unlocked_ioctl+0x836/0xae0 [nvidia]
May 11 19:34:24 AORUS kernel: ? __x64_sys_ioctl+0x90/0xc0
May 11 19:34:24 AORUS kernel: ? do_syscall_64+0x4b/0x140
May 11 19:34:24 AORUS kernel: ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 11 19:34:24 AORUS kernel: </TASK>
May 11 19:34:24 AORUS kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
May 11 19:34:25 AORUS kernel: NVRM: Error in service of callback
May 11 19:34:30 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
May 11 19:34:35 AORUS kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:4048:4040
Now, Xid 79 could point to a hardware issue, but here’s why I don’t think it’s not the case:
- I’ve tried multiple driver versions, including 555, 560, 565 and 570. All of these experience crashes except for 550.
- The GPU works fine in Windows - even more, as far as I can tell it’s way more performant. I’m not even talking about game performance (which is much better in Windows), but I’m also experiencing visual stutters at times, and screen starts to lag tremendously while streaming or recording.
It could still point to a BIOS issue, but I wouldn’t know what exactly causes this. So here’s my system info and the bug report:
https://termbin.com/gchv
nvidia-bug-report.log.gz (1.9 MB)
Like I said, I’m running a Gigabyte AORUS 16X ASG (2024), Linux Mint 22.1 with Liquorix kernel.