560.35.03 Processes crashing with NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed:

I’m currently running Fedora 40 on kernel 6.10.6.
I’m running a laptop on an i9-13900HX with a GeForce RTX 4080 Max-Q / Mobile and run exclusively on one displayport screen that is wired to the dGPU.

While everything worked fine up until driver 555.58.02 my system completely broke down since 560.
After a few seconds some application would just freeze or crash.
Afterwards I cant open most of my applications ( all hardware accelerated ).

Aug 28 10:18:40 crashtux kernel: NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00e77; hParent=0xbeef00f0; hObject=0xb>
Aug 28 10:18:40 crashtux kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x000000>
Aug 28 10:18:40 crashtux kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x000000>
Aug 28 10:18:40 crashtux kernel: NVRM: rpcRmApiAlloc_GSP: GspRmAlloc failed: hClient=0xc1d00e77; hParent=0xbeef00f0; hObject=0xb>
Aug 28 10:18:40 crashtux kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x000000>
Aug 28 10:18:40 crashtux kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x000000>
Aug 28 10:18:40 crashtux audit[32456]: ANOM_ABEND auid=1000 uid=1000 gid=1001 ses=2 subj=unconfined pid=32456 comm="kitty" exe=">
Aug 28 10:18:40 crashtux kernel: kitty[32456]: segfault at 20 ip 00007fb87157c2b7 sp 00007ffde45913e8 error 6 in libc.so.6[16d2b>
Aug 28 10:18:40 crashtux kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe>
Aug 28 10:18:40 crashtux audit: BPF prog-id=103 op=LOAD
Aug 28 10:18:40 crashtux audit: BPF prog-id=104 op=LOAD
Aug 28 10:18:40 crashtux audit: BPF prog-id=105 op=LOAD
Aug 28 10:18:40 crashtux audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-co>
Aug 28 10:18:40 crashtux systemd[1]: Started systemd-coredump@5-32465-0.service - Process Core Dump (PID 32465/UID 0).
Aug 28 10:18:40 crashtux audit: BPF prog-id=106 op=LOAD
Aug 28 10:18:40 crashtux audit: BPF prog-id=107 op=LOAD
Aug 28 10:18:40 crashtux audit: BPF prog-id=108 op=LOAD
Aug 28 10:18:40 crashtux systemd[1]: Started drkonqi-coredump-processor@5-32465-0.service - Pass systemd-coredump journal entrie>
Aug 28 10:18:40 crashtux audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=drkonqi-co>Aug 28 10:18:40 crashtux systemd[2821]: Started kitty-32456-0.scope - kitty child process: 32464 launched by: 32456.
Aug 28 10:18:40 crashtux systemd-coredump[32468]: [🡕] Process 32456 (kitty) of user 1000 dumped core.
                                                  
                                                  Module select.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc4>
                                                  Module _posixsubprocess.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.>
                                                  Module libxshmfence.so.1 from rpm libxshmfence-1.3.2-3.fc40.x86_64
                                                  Module libxcb-sync.so.1 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libxcb-shm.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libxcb-xfixes.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libxcb-dri2.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libEGL_mesa.so.0 from rpm mesa-24.2.0-1.fc40.x86_64
                                                  Module libxcb-present.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libX11-xcb.so.1 from rpm libX11-1.8.10-1.fc40.x86_64
                                                  Module libX11.so.6 from rpm libX11-1.8.10-1.fc40.x86_64
                                                  Module libpciaccess.so.0 from rpm libpciaccess-0.16-12.fc40.x86_64
                                                  Module libtinfo.so.6 from rpm ncurses-6.4-12.20240127.fc40.x86_64
                                                  Module libedit.so.0 from rpm libedit-3.1-53.20240808cvs.fc40.x86_64
                                                  Module libXau.so.6 from rpm libXau-1.0.11-6.fc40.x86_64
                                                  Module libxcb.so.1 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libdrm_intel.so.1 from rpm libdrm-2.4.122-1.fc40.x86_64
                                                  Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.122-1.fc40.x86_64
                                                  Module libelf.so.1 from rpm elfutils-0.191-4.fc40.x86_64
                                                  Module libdrm_radeon.so.1 from rpm libdrm-2.4.122-1.fc40.x86_64
                                                  Module libsensors.so.4 from rpm lm_sensors-3.6.0-18.fc40.x86_64
                                                  Module libxcb-dri3.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libglapi.so.0 from rpm mesa-24.2.0-1.fc40.x86_64
                                                  Module libxcb-randr.so.0 from rpm libxcb-1.17.0-1.fc40.x86_64
                                                  Module libexpat.so.1 from rpm expat-2.6.2-1.fc40.x86_64
                                                  Module libgallium-24.2.0.so from rpm mesa-24.2.0-1.fc40.x86_64
                                                  Module libgbm.so.1 from rpm mesa-24.2.0-1.fc40.x86_64
                                                  Module libnvidia-egl-gbm.so.1 from rpm egl-gbm-1.1.2-1.fc40.x86_64
                                                  Module libdrm.so.2 from rpm libdrm-2.4.122-1.fc40.x86_64
                                                  Module libwayland-server.so.0 from rpm wayland-1.23.0-2.fc40.x86_64
                                                  Module libnvidia-egl-wayland.so.1 from rpm egl-wayland-1.1.15-1.fc40.x86_64
                                                  Module libGLdispatch.so.0 from rpm libglvnd-1.7.0-4.fc40.x86_64
                                                  Module libEGL.so.1 from rpm libglvnd-1.7.0-4.fc40.x86_64
                                                  Module libxml2.so.2 from rpm libxml2-2.12.8-1.fc40.x86_64
                                                  Module libfontconfig.so from rpm fontconfig-2.15.0-6.fc40.x86_64
                                                  Module libwayland-egl.so.1 from rpm wayland-1.23.0-2.fc40.x86_64
                                                  Module libwayland-cursor.so.0 from rpm wayland-1.23.0-2.fc40.x86_64
                                                  Module libzstd.so.1 from rpm zstd-1.5.6-1.fc40.x86_64
                                                  Module liblz4.so.1 from rpm lz4-1.9.4-6.fc40.x86_64
                                                  Module libcap.so.2 from rpm libcap-2.69-8.fc40.x86_64
                                                  Module libsystemd.so.0 from rpm systemd-255.10-3.fc40.x86_64
                                                  Module libdbus-1.so.3 from rpm dbus-1.14.10-3.fc40.x86_64
                                                  Module libxkbcommon.so.0 from rpm libxkbcommon-1.6.0-2.fc40.x86_64
                                                  Module libwayland-client.so.0 from rpm wayland-1.23.0-2.fc40.x86_64
                                                  Module glfw-wayland.so from rpm kitty-0.36.1-1.fc40.x86_64
                                                  Module libffi.so.8 from rpm libffi-3.4.4-7.fc40.x86_64
                                                  Module _ctypes.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc>
                                                  Module _sha2.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40>
                                                  Module _random.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc>
                                                  Module _bisect.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc>
                                                  Module array.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40>
                                                  Module _opcode.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc>
                                                  Module _json.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40>
                                                  Module binascii.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.f>
                                                  Module _struct.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc>
                                                  Module math.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40.>
                                                  Module fcntl.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40>
                                                  Module libbrotlicommon.so.1 from rpm brotli-1.1.0-3.fc40.x86_64
                                                  Module libpcre2-8.so.0 from rpm pcre2-10.44-1.fc40.x86_64
                                                  Module libbrotlidec.so.1 from rpm brotli-1.1.0-3.fc40.x86_64
                                                  Module libgraphite2.so.3 from rpm graphite2-1.3.14-15.fc40.x86_64
                                                  Module libglib-2.0.so.0 from rpm glib2-2.80.3-1.fc40.x86_64
                                                  Module libfreetype.so.6 from rpm freetype-2.13.2-5.fc40.x86_64
                                                  Module libcrypto.so.3 from rpm openssl-3.2.2-3.fc40.x86_64
                                                  Module liblcms2.so.2 from rpm lcms2-2.16-3.fc40.x86_64
                                                  Module libpng16.so.16 from rpm libpng-1.6.40-3.fc40.x86_64
                                                  Module libharfbuzz.so.0 from rpm harfbuzz-8.5.0-1.fc40.x86_64
                                                  Module fast_data_types.so from rpm kitty-0.36.1-1.fc40.x86_64
                                                  Module liblzma.so.5 from rpm xz-5.4.6-3.fc40.x86_64
                                                  Module _lzma.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40>
                                                  Module libbz2.so.1 from rpm bzip2-1.0.8-18.fc40.x86_64
                                                  Module _bz2.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40.>
                                                  Module libz.so.1 from rpm zlib-ng-2.1.6-2.git.fc40.x86_64
                                                  Module zlib.cpython-312-x86_64-linux-gnu.so from rpm python3.12-3.12.4-1.fc40.>
                                                  Module libpython3.12.so.1.0 from rpm python3.12-3.12.4-1.fc40.x86_64
                                                  Module kitty from rpm kitty-0.36.1-1.fc40.x86_64
                                                  Stack trace of thread 32456:
                                                  #0  0x00007fb87157c2b7 __memcpy_avx_unaligned_erms (libc.so.6 + 0x16d2b7)
                                                  #1  0x00007fb862e78529 screen_update_cell_data (fast_data_types.so + 0x78529)
                                                  #2  0x00007fb862e823ab cell_prepare_to_render.lto_priv.0 (fast_data_types.so +>
                                                  #3  0x00007fb862e139da process_global_state (fast_data_types.so + 0x139da)
                                                  #4  0x00007fb862024953 dispatchTimers.part.0.constprop.0.isra.0 (glfw-wayland.>
                                                  #5  0x00007fb862008556 glfwRunMainLoop (glfw-wayland.so + 0xa556)
                                                  #6  0x00007fb862e0c9a4 main_loop.lto_priv.0 (fast_data_types.so + 0xc9a4)
                                                  #7  0x00007fb871797936 method_vectorcall_NOARGS (libpython3.12.so.1.0 + 0x1979>
                                                  #8  0x00007fb871788b7c PyObject_Vectorcall (libpython3.12.so.1.0 + 0x188b7c)
                                                  #9  0x00007fb871771ea5 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171e>
                                                  #10 0x00007fb87176bc44 _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x>
                                                  #11 0x00007fb87179af21 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19af21)
                                                  #12 0x00007fb871848415 slot_tp_call (libpython3.12.so.1.0 + 0x248415)
                                                  #13 0x00007fb871769476 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169476)
                                                  #14 0x00007fb871771ea5 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171e>
                                                  #15 0x00007fb8717fdef4 PyEval_EvalCode (libpython3.12.so.1.0 + 0x1fdef4)
                                                  #16 0x00007fb87181b0f7 builtin_exec (libpython3.12.so.1.0 + 0x21b0f7)
                                                  #17 0x00007fb871788fdc cfunction_vectorcall_FASTCALL_KEYWORDS (libpython3.12.s>
                                                  #18 0x00007fb871788b7c PyObject_Vectorcall (libpython3.12.so.1.0 + 0x188b7c)
                                                  #19 0x00007fb871771ea5 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171e>
                                                  #20 0x00007fb87182ec2e pymain_run_module (libpython3.12.so.1.0 + 0x22ec2e)
                                                  #21 0x00007fb8716bb072 Py_RunMain.cold (libpython3.12.so.1.0 + 0xbb072)
                                                  #22 0x0000556dfb4940df main (kitty + 0x30df)
                                                  #23 0x00007fb871439088 __libc_start_call_main (libc.so.6 + 0x2a088)
                                                  #24 0x00007fb87143914b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
                                                  #25 0x0000556dfb494535 _start (kitty + 0x3535)
                                                  
                                                  Stack trace of thread 32463:
                                                  #0  0x00007fb87151c87d __poll (libc.so.6 + 0x10d87d)
                                                  #1  0x00007fb862e0dc45 io_loop (fast_data_types.so + 0xdc45)
                                                  #2  0x00007fb8714a66d7 start_thread (libc.so.6 + 0x976d7)
                                                  #3  0x00007fb87152a60c __clone3 (libc.so.6 + 0x11b60c)
                                                  
                                                  Stack trace of thread 32462:
                                                  #0  0x00007fb8714a2da9 __futex_abstimed_wait_common (libc.so.6 + 0x93da9)
                                                  #1  0x00007fb8714a57f9 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x967f9)
                                                  #2  0x00007fb8610bbf38 n/a (libEGL_nvidia.so.0 + 0xbbf38)
                                                  #3  0x00007fb86108aef1 n/a (libEGL_nvidia.so.0 + 0x8aef1)
                                                  #4  0x00007fb8610c1fce n/a (libEGL_nvidia.so.0 + 0xc1fce)
                                                  #5  0x00007fb8714a66d7 start_thread (libc.so.6 + 0x976d7)
                                                  #6  0x00007fb87152a60c __clone3 (libc.so.6 + 0x11b60c)
                                                  ELF object binary architecture: AMD x86-64
Aug 28 10:18:40 crashtux systemd[1]: systemd-coredump@5-32465-0.service: Deactivated successfully.

This happens on both open closed kernel drivers.
The only options I enabled are
options nvidia-drm modeset=1 fbdev=1

However it looks like these issues only appear at all when I enable the 120Hz refresh rate.
I played a videogame for a few minutes and no crashes occurred when I set the refresh rate to 60hz.

wlr-randr
DP-1 "ASUSTek COMPUTER INC PG48UQ  (DP-1)"
  Make: ASUSTek COMPUTER INC
  Model: PG48UQ
  Serial: (null)
  Physical size: 700x390 mm
  Enabled: yes
  Modes:
    3840x2160 px, 59.997002 Hz (preferred)
    3840x2160 px, 119.879997 Hz (current)
    3840x2160 px, 119.999001 Hz
    3840x2160 px, 59.939999 Hz
    3840x2160 px, 23.976000 Hz
    3840x1600 px, 119.982002 Hz
    3440x1440 px, 119.999001 Hz
    3840x1080 px, 120.000000 Hz
    1920x2160 px, 59.987999 Hz
    2560x1440 px, 119.878998 Hz
    2560x1440 px, 59.939999 Hz
    1920x1080 px, 119.878998 Hz
    1920x1080 px, 60.000000 Hz
    1920x1080 px, 59.938999 Hz
    1680x1050 px, 59.953999 Hz
    1280x1024 px, 60.020000 Hz
    1440x900 px, 59.887001 Hz
    1280x720 px, 100.000000 Hz
    1280x720 px, 59.943001 Hz
    1280x720 px, 50.000000 Hz
    1024x768 px, 75.028999 Hz
    1024x768 px, 60.004002 Hz
    800x600 px, 75.000000 Hz
    800x600 px, 60.317001 Hz
    720x480 px, 59.939999 Hz
    640x480 px, 75.000000 Hz
    640x480 px, 59.939999 Hz
  Position: 0,0
  Transform: normal
  Scale: 1.000000
  Adaptive Sync: enabled

How can I fix this issues ?

EDIT:
Heres the bugreport
nvidia-bug-report.log.gz (4.2 MB)

This error disappears when I disable the GSP with NVreg_EnableGpuFirmware=0 but the processes nevertheless crashes.

Aug 28 10:32:16 crashtux kernel: NVRM: GPU at PCI:0000:01:00: GPU-f6cd1b06-0e50-622e-08d9-5b44281bcb65
Aug 28 10:32:16 crashtux kernel: NVRM: Xid (PCI:0000:01:00): 62, pid='<unknown>', name=<unknown>, 2023506c 202b542a 20262e4e 202b4cac 2025f1b8 00000000 00000000 00000000
Aug 28 10:32:16 crashtux kernel: NVRM: Xid (PCI:0000:01:00): 45, pid='<unknown>', name=<unknown>, Ch 00000030
Aug 28 10:32:16 crashtux kernel: NVRM: Xid (PCI:0000:01:00): 45, pid='<unknown>', name=<unknown>, Ch 00000034
Aug 28 10:32:16 crashtux kernel: NVRM: Xid (PCI:0000:01:00): 45, pid='<unknown>', name=<unknown>, Ch 00000035
Aug 28 10:32:16 crashtux kernel: NVRM: Xid (PCI:0000:01:00): 45, pid='<unknown>', name=<unknown>, Ch 00000037
Aug 28 10:32:17 crashtux /usr/bin/nvidia-powerd[2526]: Failed to get topology status 55
Aug 28 10:32:17 crashtux /usr/bin/nvidia-powerd[2526]: error setting power limit
Aug 28 10:32:17 crashtux /usr/bin/nvidia-powerd[2526]: Error setting GPU limit: 173297.
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: error setting power limit
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: Error setting GPU limit: 171901.
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: error setting power limit
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: Error setting GPU limit: 170662.
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: error setting power limit
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: Error setting GPU limit: 169477.
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: error setting power limit
Aug 28 10:32:18 crashtux /usr/bin/nvidia-powerd[2526]: Error setting GPU limit: 168439.

@amrits
Do you need more informations or logs and were you able to reproduce this in your systems ?