Driver 580 GSP firmware crash (Xid 120/154) on RTX 3070 Mobile with HDMI display — 535 works with GSP disabled

Driver 580 GSP firmware crash (Xid 120/154) on RTX 3070 Mobile with HDMI display — 535 works with GSP disabled

System Information

Component Detail
GPU NVIDIA GeForce RTX 3070 Mobile / Max-Q (GA104M, rev a1)
PCI 01:00.0 VGA compatible controller
Laptop Lenovo Legion 5 Pro 16ITH6H (82JD)
BIOS H1CN35WW
OS Ubuntu 24.04.4 LTS (noble)
Kernel 6.8.0-106-generic x86_64
PRIME mode nvidia (dedicated GPU)
HDMI Monitor LG SMARTGAME+ (3840x2160, 700mm x 390mm, EDID serial ecde0c00)
HDMI Connector card2-HDMI-A-2 (via NVIDIA GPU)
Internal Display 2560x1600 (via i915 IGP, not active in current PRIME config)

EDID (LG SMARTGAME+)


00ffffffffffff001e6ddc77ecde0c00
0122010380462778eaee55ac5240b024
0e5054210900d1c06140454081c00101
01010101010108e80030f2705a80b058
3a00ba892100001a6fc200a0a0a05550
30203500ba892100001a000000fd0030
901eff86000a202020202020000000fc
004c4720534d41525447414d452b032e

Driver Versions Tested

Driver Variant Result
580.126.09 Proprietary (nvidia-driver-580) GPU lockup, Xid 120/154
580.126.09 Open kernel (nvidia-driver-580-open) GPU lockup, Xid 120/154
535.288.01 Proprietary + NVreg_EnableGpuFirmware=0 Works perfectly

Description

When an HDMI monitor (LG SMARTGAME+, 4K) is connected, driver 580 triggers a GSP firmware page fault (Xid 120) during HDMI display initialization. This escalates to Xid 154
nvidia-bug-report.log.gz (1.4 MB)
(GPU Reset Required) and enters an infinite error loop that locks the GPU at maximum power draw, causing the system to freeze and the GPU to overheat.

Both the proprietary kernel module (nvidia-driver-580) and the open-kernel module (nvidia-driver-580-open) exhibit identical behavior.

Driver 535 is the last LTS branch where GSP is optional for Ampere GPUs. With NVreg_EnableGpuFirmware=0, the driver bypasses GSP and talks to hardware directly, and the HDMI display works perfectly with no errors.

Steps to Reproduce

  1. Install nvidia-driver-580 (580.126.09) on Ubuntu 24.04

  2. Set PRIME to on-demand or nvidia

  3. Connect LG SMARTGAME+ monitor via HDMI

  4. Reboot

  5. System freezes during display initialization; GPU overheats

  6. Observed errors (from prior sessions): Xid 120 (GSP page fault), Xid 154 (GPU Reset Required)

Expected behavior: HDMI display initializes normally, as it does on driver 535 without GSP.

Workaround

Downgrade to nvidia-driver-535 and disable GSP firmware:


sudo apt install nvidia-driver-535

echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee /etc/modprobe.d/nvidia-gsp.conf

sudo prime-select nvidia

sudo update-initramfs -u

sudo reboot

Current Working State (Driver 535, collected 2026-03-25)


$ nvidia-smi

Driver Version: 535.288.01 CUDA Version: 12.2

GPU: NVIDIA GeForce RTX 3070 Mobile, 56°C, 17W/115W, 0% utilization

Memory: 2842MiB / 8192MiB

$ xrandr --listmonitors

Monitors: 1

0: +*HDMI-0 3840/700x2160/390+0+0

$ cat /sys/class/drm/card2-HDMI-A-2/status

connected

$ sudo dmesg | grep Xid

(no output — clean, no errors)

$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.288.01 Tue Nov 18 18:26:41 UTC 2025

Kernel boot parameters (working config)


BOOT_IMAGE=/boot/vmlinuz-6.8.0-106-generic root=/dev/mapper/vgkubuntu-root ro quiet splash nvidia.NVreg_EnableGpuFirmware=0 nvidia-drm.modeset=1

dmesg NVIDIA excerpt (clean boot on 535)


[ 3.479] nvidia: loading out-of-tree module taints kernel.

[ 3.591] nvidia-nvlink: Nvlink Core is being initialized, major device number 510

[ 3.592] nvidia 0000:01:00.0: enabling device (0006 -> 0007)

[ 3.644] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.288.01

[ 3.657] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.288.01

[ 3.659] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

[ 4.501] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-4

[ 4.511] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 2

[ 4.554] nvidia-uvm: Loaded the UVM driver, major device number 508.

Notes

  • Crash logs from driver 580 were not preserved. The GPU lockup and forced reboot prevented clean log capture. The Xid 120/154 errors were observed live in dmesg before the system became unresponsive.

  • The issue is specific to the HDMI output. The internal display (routed through the Intel iGPU) works fine on all driver versions.

  • This appears to be a regression in the GSP firmware’s HDMI initialization path for Ampere (GA104M).

  • Minor note: nvidia-modeset warns “Unable to read EDID for display device DP-4” even on the working 535 config, but the HDMI display (HDMI-0 / card2-HDMI-A-2) works correctly regardless.

  • The nvidia-bug-report.log.gz attached was generated on this working 535 configuration with the HDMI monitor connected and functioning.

Request

Please investigate the GSP firmware regression on Ampere (GA104M) when initializing HDMI outputs in the 580 driver branch. Ideally, provide either a fix or a way to opt out of GSP on Ampere in 580+, as was possible in 535.

I’m having this exact same issue on Fedora 43, I’ve replied on an existing issue on the open kernel driver GH but I can reproduce this issue here too. This seems to be the exact same problem I am having. I have the following system specs:
Dell G15 5515 (Ryzen edition)
GA106M GeForce RTX 3060 Mobile, rev a1
VBIOS 94.06.17.00.35
Dell subsystem ID 0a6e

OS, drivers, and setup:
Fedora 43, kernel 6.19.8-200.fc43.x86_64
akmod-nvidia-open 580.126.18 via RPM Fusion
Tested on both open and proprietary modules, identical behavior
BIOS: 1.30.0
Secure boot disabled

Specific GPU info:

Model:           NVIDIA GeForce RTX 3060 Laptop GPU
IRQ:             106
GPU UUID:        GPU-dac19efe-d70e-fa2c-86c9-07cad67bf11b
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        47 bits
DMA Mask:        0x7fffffffffff
Bus Location:    0000:01:00.0
Device Minor:    0
GPU Firmware:    N/A
GPU Excluded:    No

Note about the VBIOS weirdness: this happens after suspend and wakeup of the GPU. Suspend works but trying to wake the GPU up the GSP freaks out.

Running sudo dmesg | grep -iE "gsp|nvrm|nvidia" | head -40 :

[    7.614852] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[    7.618523] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    7.618701] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    7.669485] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  580.126.18  Release Build  (dvs-builder@U22-I3-H04-01-6)  Wed Feb 11 18:33:27 UTC 2026
[    7.702908] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  580.126.18  Release Build  (dvs-builder@U22-I3-H04-01-6)  Wed Feb 11 18:19:14 UTC 2026
[    7.750780] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    8.115805] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input16
[    8.115938] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input17
[    8.116022] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input18
[    8.116097] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input19
[    8.269397] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
[    9.881610] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
[    9.921989] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
[    9.922503] nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
[   38.671331] NVRM: gpuWaitForGfwBootComplete_TU102: failed to wait for GFW_BOOT: (progress 0x9)
[   38.671338] NVRM: kgspWaitForGfwBootOk_TU102: failed to wait for GFW boot complete: 0x55 VBIOS version 94.06.17.00.35
[   38.671340] NVRM: kgspWaitForGfwBootOk_TU102: (the GPU may be in a bad state and may need to be reset)
[   44.680845] NVRM: _kgspLogXid119: ********************************* GSP Timeout **********************************
[   44.680851] NVRM: _kgspLogXid119: Note: Please also check logs above.
[   44.680861] NVRM: GPU at PCI:0000:01:00: GPU-dac19efe-d70e-fa2c-86c9-07cad67bf11b
[   44.680864] NVRM: Xid (PCI:0000:01:00): 119, pid=3241, name=nvidia-smi, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) sequence 394 (0x2080205b 0x4).
[   44.680886] NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) sequence 394 and data 0x000000002080205b 0x0000000000000004.
[   44.680889] NVRM: GPU0 RPC history (CPU -> GSP):
[   44.680891] NVRM:     entry function                     sequence data0              data1              ts_start           ts_end             duration actively_polling
[   44.680894] NVRM:      0    76   GSP_RM_CONTROL               394 0x000000002080205b 0x0000000000000004 0x00064e07a9a8dad7 0x0000000000000000          y
[   44.680898] NVRM:     -1    47   UNLOADING_GUEST_DRIVE        393 0x0000000000000000 0x0000000000000000 0x00064e07a974921e 0x00064e07a9798461 324163us  
[   44.680902] NVRM:     -2    10   FREE                         392 0x00000000c1e00010 0x0000000000000000 0x00064e07a9748f71 0x00064e07a97491d2    609us  
[   44.680905] NVRM:     -3    10   FREE                         391 0x000000000000000a 0x0000000000000000 0x00064e07a9748a71 0x00064e07a9748f6f   1278us  
[   44.680909] NVRM:     -4    10   FREE                         390 0x000000000000000b 0x0000000000000000 0x00064e07a974864e 0x00064e07a9748800    434us  
[   44.680912] NVRM:     -5    10   FREE                         389 0x0000000000000006 0x0000000000000000 0x00064e07a9748421 0x00064e07a9748643    546us  
[   44.680915] NVRM:     -6    10   FREE                         388 0x0000000000000002 0x0000000000000000 0x00064e07a97475a0 0x00064e07a97483f8   3672us  
[   44.680918] NVRM:     -7    10   FREE                         387 0x0000000000000005 0x0000000000000000 0x00064e07a9746c9d 0x00064e07a9747599   2300us  
[   44.680921] NVRM: GPU0 RPC event history (CPU <- GSP):
[   44.680922] NVRM:     entry function                     sequence data0              data1              ts_start           ts_end             duration during_incomplete_rpc
[   44.680924] NVRM:      0    4108 UCODE_LIBOS_PRINT              0 0x0000000000000000 0x0000000000000000 0x00064e07a9754996 0x00064e07a9754997      1us  
[   44.680928] NVRM:     -1    4111 PERF_BRIDGELESS_INFO_          0 0x0000000000000000 0x0000000000000000 0x00064e07a974e6f3 0x00064e07a974e6f4      1us  
[   44.680931] NVRM:     -2    4108 UCODE_LIBOS_PRINT              0 0x0000000000000000 0x0000000000000000 0x00064e07a7ec3b1b 0x00064e07a7ec3b1b           
[   44.680934] NVRM:     -3    4108 UCODE_LIBOS_PRINT              0 0x0000000000000000 0x0000000000000000 0x00064e07a7ec39eb 0x00064e07a7ec39ed      2us  
[   44.680937] NVRM:     -4    4098 GSP_RUN_CPU_SEQUENCER          0 0x000000000000061c 0x0000000000003fe2 0x00064e07a7eb83ff 0x00064e07a7eb97ce   5071us  
[   44.680944] CPU: 12 UID: 1000 PID: 3241 Comm: nvidia-smi Tainted: G           OE       6.19.8-200.fc43.x86_64 #1 PREEMPT(lazy) 

I’ve also tried the following config with no change in behaviour:

options nvidia NVreg_EnableGpuFirmware=0
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1