Kernel null pointer dereference in nvidia_modeset during Thunderbolt dock disconnect

okok17 · February 1, 2026, 7:47pm

Bug Report: Kernel NULL pointer dereference in nvidia_modeset during Thunderbolt dock disconnect (multiple monitors only)

Summary

Disconnecting a Thunderbolt 4 dock causes a kernel NULL pointer dereference in nvidia_modeset during drm_atomic_commit, crashing the sway compositor and freezing the system.

The crash only occurs when 2+ external monitors are connected. With a single external monitor, hot-unplug works correctly and sway gracefully falls back to the internal display.

This suggests a bug in the atomic commit path when disabling multiple CRTCs/connectors simultaneously during hot-unplug.

Hardware

Laptop: Lenovo ThinkPad X1 Extreme Gen 2 (20QV001GMX)
GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q [10de:1f91] (Turing)
iGPU: Intel UHD Graphics 630
Dock: Lenovo ThinkPad Thunderbolt 4 Dock (40B0)
Thunderbolt Controller: Intel JHL7540 (Titan Ridge)
External Displays: Samsung S32D850 (2560x1440), Samsung S34CG50 (3440x1440) via dock DP ports

Software

OS: NixOS 25.11
Compositor: sway (wlroots-based Wayland compositor)
Display Configuration: Hybrid GPU setup using WLR_DRM_DEVICES=/dev/dri/igpu:/dev/dri/dgpu

Versions Tested (all exhibit the crash)

NVIDIA Drivers:

580.119.02 (stable/production)
590.48.01 (latest/beta)
Both open and closed kernel modules

Linux Kernels:

6.6.x (LTS)
6.12.64
6.18.4 (latest)

Steps to Reproduce

Boot system with Thunderbolt dock connected
Connect two or more external monitors to the dock (e.g., one HDMI, one DisplayPort)
Login to sway compositor (external monitors work correctly)
Physically disconnect the Thunderbolt dock cable
System freezes immediately

Does not crash when:

Only one external monitor is connected to the dock. In this case, hot-unplug works correctly and sway falls back to eDP-1

Behavior

Kernel NULL pointer dereference in nvidia_modeset, killing the sway process and freezing the display. System requires hard reboot.

Kernel Oops (Driver 590.48.01, Kernel 6.12.64)

BUG: kernel NULL pointer dereference, address: 0000000000000409
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 2 UID: 1000 PID: 1592 Comm: sway Tainted: P           O       6.12.64 #1-NixOS
Hardware name: LENOVO 20QV001GMX/20QV001GMX, BIOS N2OET69W (1.56 ) 12/02/2025
RIP: 0010:_nv000778kms+0x4/0x10 [nvidia_modeset]

Call Trace:
 <TASK>
 _nv001339kms+0x94/0x180 [nvidia_modeset]
 _nv001293kms+0x25f/0x520 [nvidia_modeset]
 _nv001254kms+0xb4/0x202 [nvidia_modeset]
 _nv001271kms+0x9d/0x180 [nvidia_modeset]
 _nv002592kms+0x45f/0x7d0 [nvidia_modeset]
 _nv000725kms+0x1a6/0x620 [nvidia_modeset]
 ? _nv003066kms+0x74/0x140 [nvidia_modeset]
 _nv003103kms+0x9a1/0x45e0 [nvidia_modeset]
 nvKmsIoctl+0xf7/0x270 [nvidia_modeset]
 nvkms_ioctl_from_kapi_try_pmlock+0x60/0xa0 [nvidia_modeset]
 _nv000022kms+0x33e/0xbb0 [nvidia_modeset]
 nv_drm_atomic_apply_modeset_config+0x709/0x7b0 [nvidia_drm]
 drm_atomic_check_only+0x5f3/0xa10
 drm_atomic_commit+0x69/0xe0
 drm_mode_atomic_ioctl+0xaff/0xd70
 drm_ioctl_kernel+0xad/0x100
 drm_ioctl+0x2b0/0x520
 __x64_sys_ioctl+0x91/0xd0
 do_syscall_64+0xae/0x200
 </TASK>

CR2: 0000000000000409
note: sway[1592] exited with irqs disabled

Kernel Oops (Driver 580.119.02) - Same crash, different symbol

RIP: 0010:_nv000899kms+0x4/0x10 [nvidia_modeset]

Same call trace through nv_drm_atomic_apply_modeset_config.

Configuration Options Tested (none helped)

hardware.nvidia.open = true/false
hardware.nvidia.powerManagement.enable = true/false
hardware.nvidia.nvidiaPersistenced = true
hardware.nvidia.forceFullCompositionPipeline = true
boot.kernelParams = ["pcie_aspm=off" "pcie_port_pm=off"]
services.hardware.bolt.enable = true
Loading nvidia modules in initrd
dGPU-only mode (no hybrid), with and without discrete graphics mode.
Various NVreg_* module parameters

dmesg Context (events leading to crash)

pcieport 0000:05:04.0: pciehp: Slot(4): Link Down
pcieport 0000:05:04.0: pciehp: Slot(4): Card not present
xhci_hcd 0000:2f:00.0: remove, state 1
usb usb6: USB disconnect, device number 1
[... USB teardown ...]
pci_bus 0000:2e: busn_res: [bus 2e-51] is released
BUG: kernel NULL pointer dereference, address: 0000000000000409

nvidia-bug-report.sh

nvidia-bug-report_before_freeze.log.gz (476.3 KB)

It is not possible to run sudo nvidia-bug-report.sh after it freezes, not even through ssh.

Workaround

None or accept a hard reboot.

morgwai666 · February 1, 2026, 8:27pm

from the README:

system stability when an eGPU is unplugged while in use (also known as “hot-unplug”) is not guaranteed.

The fact that it worked when 1 monitor was connected is just a lucky coincidence.
See the Debian wiki on how to safely hot-unplug.

Of course it would be great if NV could make unplugging less dangerous, but given how criminally understaffed the Desktop Linux team is, there are little chances for it :(

okok17 · February 3, 2026, 9:20pm

Thanks for the tip. Disabling the outputs before unplugging is a nice workaround for me.

I dont know if it matter, but this is a dGPU, not an eGPU.

I poked around some more and found that running:

echo "0000:01:00.0" | sudo tee /sys/bus/pci/drivers/nvidia/unbind

also crashes the system, i dont think it is the same crash, but not sure. No idea if this reproduces on other machines.

morgwai666 · February 3, 2026, 9:34pm

LOLz, from your 1st post I got an impression that “Lenovo ThinkPad Thunderbolt 4 Dock (40B0)” is an eGPU dock :)))) My bad, I’m biased towards eGPUs ;-)

So in case when only displays are connected via the dock (and the GPU stays connected to your mobo), you SHOULD indeed be able to disconnect it without any preparations. Sorry for confusion again.

As explained in the wiki, before unbinding, you must ensure no processes run on a given GPU (ie nvidia-smi must say “No running processes found”). Otherwise there’s no way it does not crash at least those processes (and currently also the kernel module unfortunately).

morgwai666 · February 3, 2026, 9:52pm

@okok17 marking it as solved was probably not a right move as now no NV eng will probably ever look at this and there is a bug there that should be fixed (disconnecting a dock with several displays should work fine).

Also I should note, that unbinding a dGPU rarely makes sense. The only scenario I can think of is if you want to pass it through to a guest virtual machine using VFIO.

abchauhan · February 5, 2026, 12:54am

Hi all,

Thank you for reporting the issue. We are tracking this internally as Bug #5871511 . We will try and reproduce the issue for investigation. Please let us know if a newer driver resolves this issue.

Did this start to fail with a particular NVIDIA driver version?

rrameshbabu · February 9, 2026, 8:20pm

Hi @okok17,

On top of the ask from @abchauhan, could we reproduce the issue with the following patch built into the open-gpu-kernel-modules for 590.48.01?

With these patches you will need to set the nvidia_modeset.debug=1 parameter for the debug logs to be generated. After reproducing the issue, we will need to capture the logs with the related prints using sudo nvidia-bug-report.sh after the crash.

Topic		Replies	Views
NULL pointer dereference in DisplayPort::DeviceImpl::isFECSupported during Thunderbolt dock disconnect Linux kernel	0	48	September 26, 2025
NVIDIA 455.50.14 nvidia-modeset kernel crash on monitor re-plug Linux kernel , nvbugs	0	721	May 2, 2021
Kernel NULL pointer dereference on Ubuntu 21.04 after suspend Linux	2	1069	March 22, 2024
Hitting kernel NULL pointer dereference in nvidia_modeset on 4.10.11-1-ARCH with Quadro P2000 Linux	4	2091	October 14, 2021
Use-after-free on GTX 1650 dGPU with 545.29.06 on Fedora 39 + Wayland Linux kernel	24	1909	December 3, 2024
Getting kernel NULL pointer dereference when unloading (modprobe -r) nvidia_drm Linux kernel	4	782	November 14, 2024
System Crash, when connecting second Monitor on Notebook with Ubuntu 20.04 LTS, Kernel 5.8.0-59 and NVIDIA driver 460.84 Linux ubuntu , linux , driver	0	690	July 10, 2021
null pointer dereference error in driver. Linux	1	868	February 1, 2018
Kernel NULL pointer dereference, address: 000000000000022c Linux	0	434	June 15, 2023
[Strange partial workaround] nvidia-modeset crash on changing virtual terminal Linux	15	9463	October 11, 2019