“RTX 5060 Ti (OCuLink) GSP Firmware Crash on 575.51.02 with Linux 6.8.12”

RTX 5060 Ti (OCuLink) GPU Crashes After GSP Init on 575.51.02 – Linux Kernel 6.8 / Open Kernel Module

Hardware Setup:

GPU: NVIDIA GeForce RTX 5060 Ti (Ada, via OCuLink dock)

Host: Minisforum MS-01, Intel Raptor Lake CPU, 96GB DDR5

Bus: PCIe x4 via OCuLink

Power Supply: Confirmed stable 300W+ (dedicated dock PSU)

Kernel: Proxmox VE 8.4.1 with Linux 6.8.12-10-pve

Driver Versions Tried:

    575.51.03 BETA (open kernel module)

    570.153.02 (open kernel module)

    575.51.02 BETA (open kernel module)

Symptoms:

After clean install of either driver using the --kernel-module-type=open flag:

    nvidia-smi shows GPU for 10–30 seconds

    GPU draws power (~25W), appears in PCIe bus and kernel modules

    Then driver crashes: nvidia-smi reports “No devices were found”

    dmesg shows repeated kgspRpcRecvPoll and gpuStatePreInit_IMPL stack traces

Logs (excerpt):

[ 49.922114] _issueRpcAndWait+0xd2/0x900 [nvidia]
[ 49.922351] rpcRmApiControl_GSP+0x76f/0x940 [nvidia]
[ 49.922738] gpumgrStatePreInitGpu+0x6b/0xa0 [nvidia]
[ 49.922845] RmInitAdapter+0x12b9/0x1da0 [nvidia]

What I’ve tried:

Full power downs (not just reboot)

Both stable and beta drivers

Kernel boot params:

    nvidia.NVreg_EnableGpuFirmware=0

    NVreg_Modeset=0

Verified that device remains on PCIe bus (lspci still shows GPU)

No inference or CUDA load initiated before crash

No success with proprietary module — 5060 Ti refuses it

Current Result:

RTX 5060 Ti is briefly initialized, then drops out completely

Likely failing inside GSP firmware init process

Request:

Is this a known firmware issue with Ada GPUs in Linux?

Is updated GSP firmware expected for 5060 Ti?

Any known workaround to keep the module alive post-boot?
 Here is the journal grep'ed for nvidia:
May 21 20:29:46 minis02 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-10-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt nvidia.NVreg_EnableGpuFirmware=0
May 21 20:29:46 minis02 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-10-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt nvidia.NVreg_EnableGpuFirmware=0
May 21 20:29:47 minis02 kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
May 21 20:29:47 minis02 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 510
May 21 20:29:47 minis02 kernel: nvidia 0000:01:00.0: enabling device (0000 -> 0003)
May 21 20:29:47 minis02 kernel: nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
May 21 20:29:47 minis02 kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  575.51.02  Release Build  (dvs-builder@U22-I3-G01-3-2)  Thu Apr 10 15:55:07 UTC 2025
May 21 20:29:47 minis02 kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  575.51.02  Release Build  (dvs-builder@U22-I3-G01-3-2)  Thu Apr 10 15:40:53 UTC 2025
May 21 20:29:47 minis02 kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
May 21 20:29:47 minis02 kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
May 21 20:29:47 minis02 kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input4
May 21 20:29:47 minis02 kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input5
May 21 20:29:47 minis02 kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input6
May 21 20:29:47 minis02 kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input7
May 21 20:29:48 minis02 kernel: audit: type=1400 audit(1747884588.051:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=768 comm="apparmor_parser"
May 21 20:29:48 minis02 kernel: audit: type=1400 audit(1747884588.051:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=768 comm="apparmor_parser"
May 21 20:30:34 minis02 kernel:  os_dump_stack+0xe/0x20 [nvidia]
May 21 20:30:34 minis02 kernel:  _kgspRpcRecvPoll+0x593/0x760 [nvidia]
May 21 20:30:34 minis02 kernel:  _issueRpcAndWait+0xd2/0x900 [nvidia]
May 21 20:30:34 minis02 kernel:  ? osGetCurrentThread+0x26/0x60 [nvidia]
May 21 20:30:34 minis02 kernel:  ? os_mem_set+0x14/0x20 [nvidia]
May 21 20:30:34 minis02 kernel:  rpcRmApiControl_GSP+0x76f/0x940 [nvidia]
May 21 20:30:34 minis02 kernel:  kmemsysStatePreInitLocked_IMPL+0x72/0x100 [nvidia]
May 21 20:30:34 minis02 kernel:  gpuStatePreInit_IMPL+0x5aa/0xac0 [nvidia]
May 21 20:30:34 minis02 kernel:  gpumgrStatePreInitGpu+0x6b/0xa0 [nvidia]
May 21 20:30:34 minis02 kernel:  RmInitAdapter+0x12b9/0x1da0 [nvidia]
May 21 20:30:34 minis02 kernel:  ? _portMemAllocatorAlloc+0x2b/0xe0 [nvidia]
May 21 20:30:34 minis02 kernel:  rm_init_adapter+0xad/0xc0 [nvidia]
May 21 20:30:34 minis02 kernel:  nv_open_device+0x41d/0x9c0 [nvidia]
May 21 20:30:34 minis02 kernel:  nvidia_open_deferred+0x39/0xf0 [nvidia]
May 21 20:30:34 minis02 kernel:  _main_loop+0x7f/0x140 [nvidia]
May 21 20:30:34 minis02 kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia]