Black screen freeze with modeset driver lockup

I’m seeing a kernel lockup that causes the screen to go black and become unrecoverable. At the same time. Xorg (and plasmashell if in the WM) start spinning at 100% CPU. This is reproducible in several ways:

-Switching to a VT (i.e. Ctrl-Alt-F2)
-Using nvidia-settings or kscreen to move a display
-Allowing the screensaver to active

Each time, after waiting a few minutes, the following begins dumping from the kernel repeatedly:

Nov 16 19:37:34 acuna kernel: INFO: task nvidia-modeset/:416 blocked for more than 122 seconds.
Nov 16 19:37:34 acuna kernel: Tainted: P OE 5.15.2-arch1-1 #1
Nov 16 19:37:34 acuna kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Nov 16 19:37:34 acuna kernel: task:nvidia-modeset/ state:D stack: 0 pid: 416 ppid: 2 flags:0x00004000
Nov 16 19:37:34 acuna kernel: Call Trace:
Nov 16 19:37:34 acuna kernel: __schedule+0x331/0x1540
Nov 16 19:37:34 acuna kernel: ? _nv001179kms+0x60/0xc0 [nvidia_modeset a9999fc0a2862b6f9afa451cf4e077a22e3a30c2]
Nov 16 19:37:34 acuna kernel: schedule+0x5d/0xd0
Nov 16 19:37:34 acuna kernel: schedule_timeout+0x125/0x160
Nov 16 19:37:34 acuna kernel: __down+0xac/0x110
Nov 16 19:37:34 acuna kernel: down+0x43/0x60
Nov 16 19:37:34 acuna kernel: nvkms_kthread_q_callback+0x7d/0x100 [nvidia_modeset a9999fc0a2862b6f9afa451cf4e077a22e3a30c2]
Nov 16 19:37:34 acuna kernel: _main_loop+0x9e/0x160 [nvidia_modeset a9999fc0a2862b6f9afa451cf4e077a22e3a30c2]
Nov 16 19:37:34 acuna kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset a9999fc0a2862b6f9afa451cf4e077a22e3a30c2]
Nov 16 19:37:34 acuna kernel: kthread+0x132/0x160
Nov 16 19:37:34 acuna kernel: ? set_kthread_struct+0x50/0x50
Nov 16 19:37:34 acuna kernel: ret_from_fork+0x22/0x30

There is not any helpful information being printed (or any at all) from X when this occurs that I’ve seen.

Attached bug report:

nvidia-bug-report.log.gz (303.2 KB)

Does this also happen with just one monitor, possibly trying different connectors/cables?

It did only happen with the single monitor. The monitors were going through a KVM though, and bypassing (specifically on the Primary/DP1 monitor) seems to alleviate the issue at least initially. I know at some point in the past and tried that and still had issues, but there may have been some other weirdness involved in that test (I’m not sure I had enable nvidia-drm.modeset at that point).

It’s not ideal to bypass the KVM given the shared workspace between personal and work from home cases, but I suppose it is at least a workaround, and probably points to the KVM being the issue.

Although I will add the KVM does “work” in that there are no issues in other OSes, and both displays come up in full resolution/refresh initially, it only dies eventually in these other cases

The linux driver is very ttouchy in regard to display connection quality, tried using higher quality cables? Which brand/model is the kvm switch?

I ordered some potentially better cables to try, as well as succesfully tried a different older KVM I had around. I’ll try with the new cables when they show up. The KVM I’m having issues with is an IOGear GCS1942