System regularly hangs when a game is running, "nv_drm_handle_hotplug_event [nvidia_drm] hogged CPU" in logs

Hello. I’m suffering from an odd bug: whenerver I run a game that utilizes 3D graphics, the whole system begins to stutter (including the game itself). A stutter comes every few seconds and hangs the system for like a second.

It’s not only Windows games, Linux native games also have this problem.

Games I tried:

  1. God of War: Ragnarok (via Proton)
  2. Hi-Fi Rush (via Proton)
  3. Sonic Unleashed Recompiled (native)
  4. The Talos Principle (native)

Systems I tried:

  1. Kubuntu 24.10
  2. Kubuntu 24.04
  3. Kubuntu 23.10

I tried a few versions of the driver: 535, 550 and 560, all have the same problem.
I also tried both X Server and Wayland session, not helping.

Every time I run a game and see the problem, I also see the following kernel logs:
nv_drm_handle_hotplug_event [nvidia_drm] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
I see from 3 to 10 lines like that in logs.

Judging on the function’s name, it handles some device being connected. I thought that one of my monitors can spam with “plugged” event, but disconnecting neither of them didn’t have any effect.

System Info:
OS: Kubuntu 23.10
Kernel: Linux 6.11.0-19
CPU: AMD Ryzen 7 3800X
RAM: 32 GB
Video card: NVIDIA GeForce RTX 2070/PCIe/SSE2
Display: Acer SA270
Resolution: 1920x1080

nvidia-bug-report.log.gz (904.6 KB)

UPD:
Just tested 565 and 570 drivers, the problem still appears.

So I was able to dmesg when running a game and I noticed that when the lag happens, this line is printed about 1600 times:
[ +0,000005] nvidia 0000:09:00.0: [drm:drm_ioctl] comm=“kwin_wayland” pid=3759, dev=0xe201, auth=1, DRM_IOCTL_MODE_GETPROPERTY
I have rebuilt the kernel to see the call stack. This is what the stack is:

[  +0,000005] CPU: 4 UID: 1000 PID: 3742 Comm: kwin_wayland Tainted: P           OE      6.11.0-19-generic #19
[  +0,000003] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  +0,000001] Hardware name: System manufacturer System Product Name/PRIME B450-PLUS, BIOS 3604 02/25/2022
[  +0,000002] Call Trace:
[  +0,000001]  <TASK>
[  +0,000002]  show_stack+0x49/0x60
[  +0,000003]  dump_stack_lvl+0x5f/0x90
[  +0,000003]  drm_ioctl+0x2bb/0x5a0
[  +0,000004]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0,000008]  nv_drm_ioctl+0x42/0x450 [nvidia_drm]
[  +0,000005]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? __check_object_size.part.0+0x3a/0xe0
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000003]  ? _copy_to_user+0x41/0x60
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? drm_ioctl+0x419/0x5a0
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000004]  __x64_sys_ioctl+0xa3/0xf0
[  +0,000004]  x64_sys_call+0x121b/0x22b0
[  +0,000002]  do_syscall_64+0x7e/0x170
[  +0,000005]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? nv_drm_ioctl+0x42/0x450 [nvidia_drm]
[  +0,000005]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode+0x4e/0x250
[  +0,000004]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? do_syscall_64+0x8a/0x170
[  +0,000004]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? __audit_syscall_exit+0xbb/0x100
[  +0,000004]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_work+0x116/0x140
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode_prepare+0x38/0x80
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode+0x4e/0x250
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000003]  ? do_syscall_64+0x8a/0x170
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000003]  ? __audit_syscall_exit+0xbb/0x100
[  +0,000004]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_work+0x116/0x140
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode_prepare+0x38/0x80
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode+0x4e/0x250
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000003]  ? do_syscall_64+0x8a/0x170
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_work+0x116/0x140
[  +0,000003]  ? srso_return_thunk+0x5/0x5f
[  +0,000003]  ? syscall_exit_to_user_mode_prepare+0x38/0x80
[  +0,000002]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? syscall_exit_to_user_mode+0x4e/0x250
[  +0,000004]  ? srso_return_thunk+0x5/0x5f
[  +0,000002]  ? do_syscall_64+0x8a/0x170
[  +0,000003]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[  +0,000003]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  +0,000003] RIP: 0033:0x7eaf8192eb1d
[  +0,000002] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 0>
[  +0,000002] RSP: 002b:00007ffc56c96fe0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0,000003] RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00007eaf8192eb1d
[  +0,000002] RDX: 00007ffc56c97070 RSI: 00000000c04064aa RDI: 0000000000000017
[  +0,000001] RBP: 00007ffc56c97030 R08: 0000000000000007 R09: 0000565592aa3010
[  +0,000002] R10: 00007eaf81a11ac0 R11: 0000000000000246 R12: 00007ffc56c97070
[  +0,000002] R13: 00000000c04064aa R14: 0000000000000017 R15: 000056559483f6f0

Calling drm_ioctl 1600 times in a row I think is the reason of the lag spike. Why is it called so many times?