Frequent display freezing and Xid 56 errors on Arch Linux (endeavourOS) 575.64 and 4070 Ti

Hello!

I have a multi-monitor setup and for a while now I’ve been getting random freezes on my system during regular usage. One of the displays would freeze and it would not recover, I am forced to restart my PC in order to be able to use it again. The other display remains functional, but the windows on the frozen monitor cannot be moved onto the secondary display, it just all freezes eventually.

journalctl reveals the following errors being thrown at the time of freeze:

birž. 26 12:18:18 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00002888 000100af 00000007 00000000
birž. 26 12:18:18 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 0000288c 000100af 00000007 00000000
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00002488 000100af 00000007 00000000
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 0000248c 000100af 00000007 00000000
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00002488 000100ad 00000007 00000000
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000200 00000001 00000005 0010001a
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000200 00000001 00000005 0010001a
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000200 00000001 00000005 0010001a
birž. 26 12:18:27 rapidcore kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000200 00000001 00000005 0010001a

I cannot tell you exactly when the problem has started, but I can confirm it has been happening across multiple driver versions already. I’ve been using nvidia-open driver, most recently 575.64-2, but in order to troubleshoot the issue I’ve since switched to nvidia-dkms 575.64-1, unfortunately the issue did not go away.

System information:

nvidia-smi output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64                 Driver Version: 575.64         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   50C    P3             30W /  285W |    1829MiB /  12282MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

inxi -G output

Graphics:
  Device-1: NVIDIA AD104 [GeForce RTX 4070 Ti] driver: nvidia v: 575.64
  Device-2: Advanced Micro Devices [AMD/ATI] Raphael driver: amdgpu
    v: kernel
  Device-3: Logitech HD Pro Webcam C920 driver: snd-usb-audio,uvcvideo
    type: USB
  Display: wayland server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8
    compositor: gnome-shell v: 48.2 driver: X: loaded: amdgpu,nvidia
    unloaded: modesetting,radeon dri: radeonsi gpu: nvidia,nvidia-nvswitch
    resolution: 1: 2560x1440~165Hz 2: 2560x1440~240Hz
  API: EGL v: 1.5 drivers: nvidia,radeonsi,swrast
    platforms: gbm,wayland,x11,surfaceless,device
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 575.64
    renderer: NVIDIA GeForce RTX 4070 Ti/PCIe/SSE2
  API: Vulkan v: 1.4.313 drivers: nvidia surfaces: N/A
  Info: Tools: api: eglinfo, glxinfo, vulkaninfo gpu: nvidia-smi
    x11: xdpyinfo, xprop, xrandr

OS: EndeavourOS (build 2024.09.22)
Kernel: 6.15.3-arch1-1
Desktop environment: GNOME 48 on Wayland

I am attaching two most recent nvidia-bug-report logs that I generated after a hang has occurred.

Please help me troubleshoot and debug this issue, I would like to know if this is a driver issue or is my hardware giving up. Let me know if any additional information is needed regarding my system.

I do not seem to encounter any hangs on Windows, leading me to believe it might be driver related.

nvidia-bug-report.log.gz (1.6 MB)
nvidia-bug-report.log.old.gz (1.8 MB)

Having this same issue and our journal logs seem to match evenly. It’s very frustrating. I think it’s been an issue for the last few major driver versions or GNOME releases. I’m using a 4080 I’ve had for a while. I don’t think your hardware is the issue.
Using Arch with GNOME 48.3 on Wayland
Driver: 575.64.03
Kernel: 6.15.4-273
Will see if I can figure anything out

Glad to see I’m not the only one experiencing this issue! A few weeks ago it was driving me crazy, happening every few hours, now I haven’t experienced it for a while, I think it only happened once this week.

I also want to try using KDE for a bit to narrow it down, in case it’s got something to do with GNOME.

Out of curiosity, are you using a custom font scaling factor?

Hi there,
I think I have the same issue.
I can use the system without any problem, and then suddenly, the system freezes for 3 seconds, resumes working for 4 seconds, then freeze again for 3 seconds …
if I disconnect hdmi cable, the system stabilizes and seems back to normal. It can crash completely if I reconnect then hdmi.
It can also freeze completely during a game (I can press “alt+f4” to close the game and restart it)

I have a laptop with a 5070ti on Linux Opensuse tumbleweed
driver : 570.169
Kernel : 6.15.5-1-default

here some logs from dmesg that triggers during the bug :

[   T1412] NVRM: GPU at PCI:0000:02:00: GPU-c87a248f-fbb4-3575-0006-500924b6e41d
[  +0,000005] [   T1412] NVRM: GPU Board Serial Number: 0
[  +0,000001] [   T1412] NVRM: Xid (PCI:0000:02:00): 62, 3226e15b 0000b1a8 00000000 20686aee 20685c94 20685e26 20684294 20684ad0
[  +0,003192] [   T1419] NVRM: Xid (PCI:0000:02:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
[ +14,443900] [   T1320] NVRM: Xid (PCI:0000:02:00): 109, pid=1322, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[  +2,765195] [   T1320] NVRM: _kgspLogXid119: ********************************* GSP Timeout **********************************
[  +0,000003] [   T1320] NVRM: _kgspLogXid119: Note: Please also check logs above.
[  +0,000157] [   T1320] NVRM: Xid (PCI:0000:02:00): 119, pid=1320, name=nvidia-modeset/, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20802801 0x4).
[  +0,000013] [   T1320] NVRM: GPU0 GSP RPC buffer contains function 4100 (RC_TRIGGERED) and data 0x0000000000000001 0x000000000000006d.
[  +0,000003] [   T1320] NVRM: GPU0 RPC history (CPU -> GSP):
[  +0,000001] [   T1320] NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
[  +0,000002] [   T1320] NVRM:      0    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a370da8a08 0x0000000000000000          y
[  +0,000004] [   T1320] NVRM:     -1    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a370c36c7d 0x000639a370c37892   3093us
[  +0,000003] [   T1320] NVRM:     -2    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a370c1e1f2 0x000639a370c1e7d8   1510us
[  +0,000002] [   T1320] NVRM:     -3    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a3708bf8cb 0x000639a3708c01ca   2303us
[  +0,000002] [   T1320] NVRM:     -4    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a370895cb5 0x000639a370895f84    719us
[  +0,000002] [   T1320] NVRM:     -5    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a370819601 0x000639a37081a04b   2634us
[  +0,000001] [   T1320] NVRM:     -6    76   GSP_RM_CONTROL        0x0000000020802801 0x0000000000000004 0x000639a3707c317b 0x000639a3707c37f7   1660us
[  +0,000002] [   T1320] NVRM:     -7    76   GSP_RM_CONTROL        0x0000000020800a56 0x000000000000005c 0x000639a3702f7e33 0x000639a3702f838a   1367us
[  +0,000002] [   T1320] NVRM: GPU0 RPC event history (CPU <- GSP):
[  +0,000001] [   T1320] NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
[  +0,000001] [   T1320] NVRM:      0    4100 RC_TRIGGERED          0x0000000000000001 0x000000000000006d 0x000639a3710c45b7 0x000639a3710c45c9     18us y
[  +0,000003] [   T1320] NVRM:     -1    4102 OS_ERROR_LOG          0x0000000000000000 0x0000000000000000 0x000639a3710be8de 0x000639a3710be919     59us y
[  +0,000003] [   T1320] NVRM:     -2    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370c94919 0x000639a370c94921      8us
[  +0,000002] [   T1320] NVRM:     -3    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370c736b4 0x000639a370c736ba      6us
[  +0,000002] [   T1320] NVRM:     -4    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370c5d1ae 0x000639a370c5d1b1      3us
[  +0,000002] [   T1320] NVRM:     -5    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370c4f963 0x000639a370c4f969      6us
[  +0,000002] [   T1320] NVRM:     -6    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370c37d9c 0x000639a370c37da2      6us
[  +0,000002] [   T1320] NVRM:     -7    4099 POST_EVENT            0x0000000000000000 0x0000000000000000 0x000639a370b86efd 0x000639a370b86f04      7us
[  +0,000004] [   T1320] CPU: 5 UID: 0 PID: 1320 Comm: nvidia-modeset/ Tainted: G           O        6.15.5-1-default #1 PREEMPT(voluntary) openSUSE Tumbleweed  28b123572e8ac77e02fe7d437ecda8584e6863fe
[  +0,000006] [   T1320] Tainted: [O]=OOT_MODULE
[  +0,000001] [   T1320] Hardware name: PCSpecialist Recoil 18/X58xWNx, BIOS 1.07.04TPLV 04/14/2025
[  +0,000002] [   T1320] Call Trace:
[  +0,000003] [   T1320]  <TASK>
[  +0,000003] [   T1320]  dump_stack_lvl+0x5b/0x80
[  +0,000014] [   T1320]  _kgspRpcRecvPoll+0x52f/0x760 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000275] [   T1320]  _issueRpcAndWait+0x6c/0x380 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000071] [   T1320]  ? rpcWriteCommonHeader+0x61/0x250 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000065] [   T1320]  rpcRmApiControl_GSP+0x286/0x9c0 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000060] [   T1320]  ? _tlsThreadEntryGet+0x82/0x90 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000047] [   T1320]  ? osGetCurrentThread+0x26/0x60 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000118] [   T1320]  rmresControl_Prologue_IMPL+0xd2/0x240 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000065] [   T1320]  resControl_IMPL+0xcf/0x1d0 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000050] [   T1320]  serverControl+0x493/0x5c0 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000053] [   T1320]  _rmapiRmControl+0x6da/0x980 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000069] [   T1320]  rmapiControlWithSecInfo+0x79/0x140 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000068] [   T1320]  rmapiControlWithSecInfoTls+0x76/0xe0 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000069] [   T1320]  ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000026] [   T1320]  _nv04ControlWithSecInfo.constprop.0+0x82/0x90 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000066] [   T1320]  ? __schedule+0x3f3/0x13d0
[  +0,000004] [   T1320]  ? __wake_up_common+0x6f/0xa0
[  +0,000004] [   T1320]  Nv04ControlKernel+0x60/0x70 [nvidia bb124d0b5d5e5931859f8c6bb63c8b622d7f9fec]
[  +0,000065] [   T1320]  nvkms_call_rm+0x49/0x80 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000015] [   T1320]  nvRmApiControl+0x5b/0x70 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000026] [   T1320]  IdleTimerProc+0xd5/0xf0 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000022] [   T1320]  nvkms_kthread_q_callback+0xe3/0x170 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000014] [   T1320]  _main_loop+0x90/0x150 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000014] [   T1320]  ? __pfx__main_loop+0x10/0x10 [nvidia_modeset 9e75c09a93c0ea63604b6cafa6ddc94b6c61768d]
[  +0,000013] [   T1320]  kthread+0xf9/0x230
[  +0,000003] [   T1320]  ? __pfx_kthread+0x10/0x10
[  +0,000001] [   T1320]  ret_from_fork+0x31/0x50
[  +0,000003] [   T1320]  ? __pfx_kthread+0x10/0x10
[  +0,000001] [   T1320]  ret_from_fork_asm+0x1a/0x30
[  +0,000003] [   T1320]  </TASK>
[  +0,000006] [   T1320] NVRM: _kgspLogXid119: ********************************************************************************
[  +0,000002] [   T1320] NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 4100!
[  +5,370241] [   T1412] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[  +1,851320] [      C3] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[  +8,195087] [   T1320] NVRM: Xid (PCI:0000:02:00): 109, pid=1322, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[Jul 9 18:06] [   T4294] BUG: unable to handle page fault for address: ffff890e6357b638
[  +0.000007] [   T4294] #PF: supervisor read access in kernel mode
[  +0.000002] [   T4294] #PF: error_code(0x0000) - not-present page
[  +0.000001] [   T4294] PGD 0 P4D 0
[  +0.000003] [   T4294] Oops: Oops: 0000 [#1] SMP NOPTI
[  +0.000003] [   T4294] CPU: 6 UID: 1000 PID: 4294 Comm: code-insiders Tainted: G           O        6.15.4-1-default #1 PREEMPT(voluntary) openSUSE Tumbleweed  b086527c9bdf9d45b8c931cfc02462c1786f868d
[  +0.000003] [   T4294] Tainted: [O]=OOT_MODULE
[  +0.000001] [   T4294] Hardware name: PCSpecialist Recoil 18/X58xWNx, BIOS 1.07.04TPLV 04/14/2025
[  +0.000001] [   T4294] RIP: 0010:copy_process+0x134e/0x2420
[  +0.000005] [   T4294] Code: 04 4c 89 64 24 18 4c 89 74 24 10 48 83 ce ff 48 8d 7c 24 70 e8 03 3c ce 00 49 89 c4 48 85 c0 0f 84 a5 0f 00 00 49 8b 44 24 10 <48> 8b 90 b8 00 00 00 80 e2 01 0f 84 93 10 00 00 8b b0 f8 00 00 00
[  +0.000001] [   T4294] RSP: 0018:ffffcc49e09afb00 EFLAGS: 00010286
[  +0.000001] [   T4294] RAX: ffff890e6357b580 RBX: ffff8b0e67e4bf40 RCX: ffff8b12cabb9f08
[  +0.000001] [   T4294] RDX: ffff8b12cabb9900 RSI: ffff8b12cabb9f00 RDI: ffffcc49e09afb70
[  +0.000001] [   T4294] RBP: ffffcc49e09afc00 R08: 000000000000000e R09: 0000000000000010
[  +0.000001] [   T4294] R10: 000016b416e03000 R11: 000016b416e03000 R12: ffff8b1210ca00c0
[  +0.000001] [   T4294] R13: 0000000000000000 R14: ffff8b127e22dc80 R15: ffff8b0f6cef1b00
[  +0.000001] [   T4294] FS:  00007f951242bfc0(0000) GS:ffff8b21f70f4000(0000) knlGS:0000000000000000
[  +0.000001] [   T4294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] [   T4294] CR2: ffff890e6357b638 CR3: 000000028d5d6001 CR4: 0000000000f72ef0
[  +0.000001] [   T4294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000000] [   T4294] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  +0.000001] [   T4294] PKRU: 55555554
[  +0.000001] [   T4294] Call Trace:
[  +0.000002] [   T4294]  <TASK>
[  +0.000003] [   T4294]  kernel_clone+0x98/0x470
[  +0.000001] [   T4294]  ? alloc_inode+0x9b/0xd0
[  +0.000004] [   T4294]  ? alloc_inode+0x9b/0xd0
[  +0.000001] [   T4294]  ? kmem_cache_alloc_noprof+0x11b/0x450
[  +0.000004] [   T4294]  __do_sys_clone+0x65/0x90
[  +0.000001] [   T4294]  do_syscall_64+0x7b/0x820
[  +0.000005] [   T4294]  ? wp_page_reuse+0x8d/0xa0
[  +0.000002] [   T4294]  ? do_wp_page+0x91d/0xec0
[  +0.000002] [   T4294]  ? __do_pipe_flags.part.0+0x29/0xb0
[  +0.000002] [   T4294]  ? __handle_mm_fault+0xac4/0xfb0
[  +0.000002] [   T4294]  ? __count_memcg_events+0xb0/0x150
[  +0.000002] [   T4294]  ? count_memcg_events.constprop.0+0x1a/0x30
[  +0.000003] [   T4294]  ? handle_mm_fault+0x1d2/0x2d0
[  +0.000001] [   T4294]  ? do_user_addr_fault+0x21a/0x690
[  +0.000002] [   T4294]  ? irqentry_exit_to_user_mode+0x2c/0x1c0
[  +0.000001] [   T4294]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0.000003] [   T4294] RIP: 0033:0x7f9512ee8e53
[  +0.000034] [   T4294] Code: 89 e7 e8 be a6 f5 ff 45 31 c0 31 d2 31 f6 64 48 8b 04 25 10 00 00 00 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d 89 c5 85 c0 75 31 64 48 8b 04 25 10 00 00
[  +0.000001] [   T4294] RSP: 002b:00007ffd06a21640 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  +0.000002] [   T4294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9512ee8e53
[  +0.000000] [   T4294] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  +0.000001] [   T4294] RBP: 00007ffd06a21950 R08: 0000000000000000 R09: 000016b402e40130
[  +0.000001] [   T4294] R10: 00007f951242c290 R11: 0000000000000246 R12: 00007ffd06a21a20
[  +0.000001] [   T4294] R13: 0000000000000001 R14: 0000000000000006 R15: 000016b402e40130
[  +0.000002] [   T4294]  </TASK>
[  +0.000001] [   T4294] Modules linked in: ccm snd_seq_dummy snd_hrtimer rfcomm snd_seq rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc netfs af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw snd_sof_pci_intel_mtl iptable_security snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common ip6table_filter snd_soc_hdac_hda ip6_tables snd_sof_intel_hda_mlink snd_sof_intel_hda soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof cmac qrtr snd_sof_utils algif_hash snd_hda_ext_core nf_tables algif_skcipher snd_soc_acpi_intel_match nfnetlink af_alg snd_soc_acpi_intel_sdca_quirks bnep iptable_filter soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_sdca snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic snd_compress
[  +0.000032] [   T4294]  snd_hda_scodec_component snd_pcm_dmaengine snd_hda_codec_hdmi crc8 binfmt_misc snd_hda_intel nls_iso8859_1 snd_intel_dspcfg intel_uncore_frequency snd_usb_audio processor_thermal_device_pci nls_cp437 intel_uncore_frequency_common snd_intel_sdw_acpi snd_usbmidi_lib vfat processor_thermal_device intel_pmc_core iwlmvm snd_hda_codec snd_ump fat processor_thermal_wt_hint x86_pkg_temp_thermal btusb uvcvideo snd_hda_core snd_rawmidi mac80211 processor_thermal_rfim intel_powerclamp btrtl nvidia_drm(O) videobuf2_vmalloc iTCO_wdt snd_hwdep snd_seq_device libarc4 intel_rapl_msr processor_thermal_rapl coretemp btintel hid_sensor_prox uvc intel_pmc_bxt nvidia_modeset(O) iTCO_vendor_support spd5118 spi_nor snd_pcm intel_rapl_common btbcm hid_sensor_trigger videobuf2_memops kvm_intel mtd processor_thermal_wt_req iwlwifi videobuf2_v4l2 hid_sensor_iio_common i2c_i801 snd_timer btmtk nvidia_uvm(O) mei_gsc_proxy kvm processor_thermal_power_floor spi_intel_pci videobuf2_common industrialio_triggered_buffer i2c_smbus videodev
[  +0.000033] [   T4294]  cfg80211 bluetooth snd mei_me pmt_telemetry int3400_thermal intel_hid irqbypass pcspkr int3403_thermal processor_thermal_mbox nvidia(O) wmi_bmof spi_intel kfifo_buf i2c_mux industrialio mc thunderbolt igc joydev nvidia_wmi_ec_backlight rfkill soundcore mei pmt_class acpi_thermal_rel sparse_keymap int340x_thermal_zone thermal acpi_pad fan ac tiny_power_button bbswitch(O) nvme_fabrics loop fuse efi_pstore dm_mod configfs ip_tables x_tables ext4 crc16 mbcache jbd2 xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm hid_sensor_hub polyval_clmulni usbhid polyval_generic hid_multitouch i915 ghash_clmulni_intel hid_generic sha512_ssse3 i2c_algo_bit sha256_ssse3 sdhci_pci drm_buddy sha1_ssse3 sdhci_uhs2 nvme ucsi_acpi ttm aesni_intel sdhci xhci_pci nvme_core intel_lpss_pci typec_ucsi drm_display_helper crypto_simd cqhci xhci_hcd nvme_keyring intel_lpss roles cec i2c_hid_acpi mxm_wmi video cryptd mmc_core intel_vpu usbcore nvme_auth intel_vsec idma64 typec battery rc_core i2c_hid wmi
[  +0.000042] [   T4294]  pinctrl_meteorpoint pinctrl_meteorlake button serio_raw msr i2c_dev efivarfs dmi_sysfs
[  +0.000006] [   T4294] CR2: ffff890e6357b638
[  +0.000001] [   T4294] ---[ end trace 0000000000000000 ]---
[  +0.000001] [   T4294] RIP: 0010:copy_process+0x134e/0x2420
[  +0.000002] [   T4294] Code: 04 4c 89 64 24 18 4c 89 74 24 10 48 83 ce ff 48 8d 7c 24 70 e8 03 3c ce 00 49 89 c4 48 85 c0 0f 84 a5 0f 00 00 49 8b 44 24 10 <48> 8b 90 b8 00 00 00 80 e2 01 0f 84 93 10 00 00 8b b0 f8 00 00 00
[  +0.000001] [   T4294] RSP: 0018:ffffcc49e09afb00 EFLAGS: 00010286
[  +0.000001] [   T4294] RAX: ffff890e6357b580 RBX: ffff8b0e67e4bf40 RCX: ffff8b12cabb9f08
[  +0.000000] [   T4294] RDX: ffff8b12cabb9900 RSI: ffff8b12cabb9f00 RDI: ffffcc49e09afb70
[  +0.000001] [   T4294] RBP: ffffcc49e09afc00 R08: 000000000000000e R09: 0000000000000010
[  +0.000001] [   T4294] R10: 000016b416e03000 R11: 000016b416e03000 R12: ffff8b1210ca00c0
[  +0.000001] [   T4294] R13: 0000000000000000 R14: ffff8b127e22dc80 R15: ffff8b0f6cef1b00
[  +0.000000] [   T4294] FS:  00007f951242bfc0(0000) GS:ffff8b21f70f4000(0000) knlGS:0000000000000000
[  +0.000001] [   T4294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] [   T4294] CR2: ffff890e6357b638 CR3: 000000028d5d6001 CR4: 0000000000f72ef0
[  +0.000001] [   T4294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000000] [   T4294] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  +0.000001] [   T4294] PKRU: 55555554
[  +0.000001] [   T4294] note: code-insiders[4294] exited with irqs disabled
[ 2594.552077] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2602.961577] [  T13661] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[ 2607.103413] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2615.493616] [   T1305] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[ 2619.670278] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2628.027029] [  T13661] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[ 2632.173668] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2640.559160] [  T13661] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[ 2644.674834] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2653.090546] [  T13661] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000
[ 2657.264750] [   T1379] NVRM: _kgspProcessRpcEvent: Unexpected RPC event from GPU0: 0x4c (GSP_RM_CONTROL)
[ 2665.622996] [   T2512] NVRM: Xid (PCI:0000:02:00): 109, pid=1307, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000

This block is repeated each freeze :

[  +1,851320] [      C3] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked!  Notify Timeout Seconds: 7
[  +8,195087] [   T1320] NVRM: Xid (PCI:0000:02:00): 109, pid=1322, name=modprobe, Ch 00000001, errorString CTX SWITCH TIMEOUT, Info 0x4000

nvidia.tar.gz (1.5 MB)

Still encountering this issue on 575.64.05

Some things I observed, it seems like the freeze happens when a cursor leaves the monitor and goes to the other?

I tried KDE Plasma for a little bit, and I did not encounter any freezes, but I also feel like I haven’t used it enough to really rule it out. I’m leaning towards this being a GNOME issue, though.

I also tried using only one monitor setup for about a week and did not encounter any freezes. After switching back to two-monitor setup, I got a freeze on the same day.