I’m reporting from 575.51.02 BETA closed drivers, but I’ve had the same problem with stable/open drivers as well.
Issue manifests as the GPU becoming completely unresponsive to applications (e.g. vkcube
hangs for a few seconds until finally selecting the iGPU) and being stuck in a powered on state until reboot. It appears to be linked specifically to power management and doesn’t happen if GPU is simply kept powered on for a long time, e.g. during gaming.
Bug report in a failure state: nvidia-bug-report.log.gz (1.5 MB)
For the record, here’s the primary error in dmesg:
[11283.704484] NVRM: GPU at PCI:0000:01:00: GPU-557b610e-2bc1-f6f2-15c8-b57ca6fbce38
[11283.704488] NVRM: Xid (PCI:0000:01:00): 119, pid=226, name=kworker/15:1, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205b 0x4).
[11283.704508] NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x000000002080205b 0x0000000000000004.
[11283.704510] NVRM: GPU0 RPC history (CPU -> GSP):
[11283.704512] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
[11283.704513] NVRM: 0 76 GSP_RM_CONTROL 0x000000002080205b 0x0000000000000004 0x0006345db66ef4a4 0x0000000000000000 y
[11283.704516] NVRM: -1 47 UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x0006345d71097548 0x0006345d710c794d 197637us
[11283.704519] NVRM: -2 10 FREE 0x00000000c1e00309 0x0000000000000000 0x0006345d71097320 0x0006345d7109750a 490us
[11283.704521] NVRM: -3 10 FREE 0x000000000000000b 0x0000000000000000 0x0006345d710970d1 0x0006345d7109731e 589us
[11283.704523] NVRM: -4 10 FREE 0x000000000000000c 0x0000000000000000 0x0006345d71096edc 0x0006345d7109703b 351us
[11283.704525] NVRM: -5 10 FREE 0x0000000000000006 0x0000000000000000 0x0006345d71096d19 0x0006345d71096ed3 442us
[11283.704527] NVRM: -6 10 FREE 0x000000000000000a 0x0000000000000000 0x0006345d71096815 0x0006345d71096d11 1276us
[11283.704529] NVRM: -7 10 FREE 0x0000000000000002 0x0000000000000000 0x0006345d71095995 0x0006345d71096677 3298us
[11283.704530] NVRM: GPU0 RPC event history (CPU <- GSP):
[11283.704532] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
[11283.704533] NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x0006345d7109f3e3 0x0006345d7109f3e4 1us
[11283.704536] NVRM: -1 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x0006345d7109bf34 0x0006345d7109bf36 2us
[11283.704538] NVRM: -2 4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x0006345d7109bdd6 0x0006345d7109bdd6
[11283.704540] NVRM: -3 4099 POST_EVENT 0x0000000000000021 0x0000000000000100 0x0006345d707668f7 0x0006345d7076690b 20us
[11283.704542] NVRM: -4 4099 POST_EVENT 0x0000000000000021 0x0000000000000020 0x0006345d706e796b 0x0006345d706e7982 23us
[11283.704544] NVRM: -5 4099 POST_EVENT 0x0000000000000021 0x0000000000000001 0x0006345d70108aee 0x0006345d70108afc 14us
[11283.704546] NVRM: -6 4099 POST_EVENT 0x0000000000000021 0x0000000000000008 0x0006345d70023cd2 0x0006345d70023cf4 34us
[11283.704548] NVRM: -7 4099 POST_EVENT 0x0000000000000021 0x0000000000000001 0x0006345d6fc89aa1 0x0006345d6fc89aad 12us
[11283.704551] CPU: 15 UID: 0 PID: 226 Comm: kworker/15:1 Tainted: P OE 6.14.5-2-cachyos #1 5b3816ec247e07a05355dcf4d86a93cbe78a5deb
[11283.704555] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[11283.704556] Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA507NV_FA507NV/FA507NV, BIOS FA507NV.316 11/04/2024
[11283.704557] Workqueue: kacpi_notify acpi_os_execute_deferred
[11283.704561] Sched_ext: lavd (enabled+all), task: runnable_at=-1ms
[11283.704562] Call Trace:
[11283.704564] <TASK>
[11283.704567] dump_stack_lvl+0x71/0x90
[11283.704570] _nv013767rm+0x5dd/0x720 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.704794] _nv013677rm+0xe2/0x880 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.704969] _nv053503rm+0x594/0x770 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.705142] _nv057098rm+0x9e/0x150 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.705338] _nv052720rm+0x1a9/0x1b0 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.705512] _nv054901rm+0x3f5/0x500 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.705693] _nv015682rm+0x469/0x680 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.705878] _nv052860rm+0x29/0x30 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.706069] ? _nv054904rm+0x60/0x60 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.706251] _nv000809rm+0x58/0x70 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.706434] _nv000808rm+0x21b/0x220 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.706637] _nv000760rm+0x1c0/0x320 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.706837] rm_transition_dynamic_power+0xd7/0x13f [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707031] nv_pmops_runtime_resume+0x76/0xf0 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707213] ? __pfx_pci_pm_runtime_resume.llvm.5339016576846103269+0x10/0x10
[11283.707216] __rpm_callback+0x93/0x350
[11283.707220] ? __pfx_pci_pm_runtime_resume.llvm.5339016576846103269+0x10/0x10
[11283.707222] rpm_resume+0x4e4/0x860
[11283.707225] __pm_runtime_resume+0x5c/0x80
[11283.707227] pci_device_shutdown.llvm.5339016576846103269+0x23/0x70
[11283.707230] nv_indicate_not_idle+0x2f/0x40 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707412] _nv048601rm+0xf4/0x240 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707593] rm_power_source_change_event+0xc0/0x184 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707778] nv_acpi_powersource_hotplug_event+0x63/0x90 [nvidia 10bb795edfb6216e2cc9508fb909d6a143ed626b]
[11283.707959] acpi_ev_notify_dispatch+0x56/0x70
[11283.707962] acpi_os_execute_deferred+0x1c/0x30
[11283.707965] process_scheduled_works+0x250/0x590
[11283.707968] worker_thread+0xf8/0x2c0
[11283.707970] ? __pfx_worker_thread+0x10/0x10
[11283.707972] kthread+0x26d/0x290
[11283.707975] ? __pfx_kthread+0x10/0x10
[11283.707977] ret_from_fork.cold+0xc/0x19
[11283.707979] ? __pfx_kthread+0x10/0x10
[11283.707980] ret_from_fork_asm+0x1a/0x30
[11283.707985] </TASK>
[11289.708467] NVRM: Xid (PCI:0000:01:00): 119, pid=226, name=kworker/15:1, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a81 0x4).
[11295.709473] NVRM: Xid (PCI:0000:01:00): 119, pid=226, name=kworker/15:1, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205b 0x4).
[11301.713432] NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30). The GPU likely needs to be reset.
[11337.723272] NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)