Sharing a working configuration for RTX 5080 via Thunderbolt 5 eGPU on Linux, since this hardware class is still rough as of driver 590.48.01. This is a workaround recipe that reliably gets the driver loaded and the GPU idling stably on my setup.
Hardware
-
Laptop: Dell Latitude 5540 (Raptor Lake-P)
-
Thunderbolt host: Intel Raptor Lake-P Thunderbolt 4 NHI
-
eGPU enclosure: Razer Core X V2 (USB4 / TB5)
-
GPU: NVIDIA GeForce RTX 5080 (Gigabyte Aorus, PCI ID
10de:2c02) -
Link: USB4 at 40 Gb/s, 2 lanes × 20 Gb/s (per
boltctl list)
Software
-
Ubuntu 24.04 LTS
-
Kernel 6.17.0-20-generic
-
nvidia-driver-590-open(590.48.01) -
Secure Boot enabled
Symptoms without the configuration below
Every probe attempt failed with:
NVRM: The NVIDIA GPU 0000:03:00.0
NVRM: (PCI ID: 10de:2c02) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
nvidia 0000:03:00.0: probe with driver nvidia failed with error -1
The driver looped probes every ~180ms. lspci showed the GPU as healthy (rev a1, BARs assigned, link at 16GT/s x4 which is expected for TB tunnel), so the GPU was enumerated but inaccessible to the driver at probe time.
Working configuration
1. BIOS (Dell, F12 → BIOS Setup → Security tab):
Enable both:
-
Thunderbolt Boot Support
-
Thunderbolt (and PCIe behind TBT) pre-boot modules
The second is critical — it executes PCIe Option ROMs during pre-boot so the BIOS enumerates the GPU at POST and allocates PCIe resources cleanly. Without this, no kernel configuration resolves the problem reliably.
2. Open kernel modules (required for Blackwell):
sudo apt install nvidia-driver-590-open
The closed variant installs but fails with modprobe: ERROR: could not insert 'nvidia': No such device — it doesn’t recognize 10de:2c02.
Enroll the DKMS key for Secure Boot:
sudo mokutil --import /var/lib/shim-signed/mok/MOK.der
sudo reboot
# enroll at blue MOK Manager screen
3. Kernel parameters:
Noting explicitly: issue #979 suggests pci=assign-busses,realloc. On this hardware, that configuration left the driver in the probe-retry loop. The HPE advisory a00151736en_us documents a different Ubuntu + NVIDIA interaction where pci=realloc causes the kernel to remove BIOS-assigned BAR ranges without reassigning them, preventing driver communication. pci=realloc=off resolved probe on this hardware.
/etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc=off pcie_aspm=off pcie_ports=native pcie_port_pm=off thunderbolt.clx=0 thunderbolt.host_reset=0 iommu=pt"
Then sudo update-grub.
Parameters:
-
pci=realloc=off— preserves BIOS BAR allocation (HPE advisory) -
pcie_aspm=off— no ASPM L0s/L1 on TB-tunneled link -
pcie_ports=native— kernel-native PCIe port management for D-state handling -
pcie_port_pm=off— no D3cold entry on upstream bridges -
thunderbolt.clx=0— no TB CLx link power states -
thunderbolt.host_reset=0— no TB host controller reset cascade -
iommu=pt— IOMMU passthrough
4. Module parameters:
/etc/modprobe.d/nvidia-egpu.conf:
options nvidia NVreg_DynamicPowerManagement=0x00
options nvidia NVreg_PreserveVideoMemoryAllocations=0
NVreg_DynamicPowerManagement=0x00 is particularly important — D3cold transition failures over Thunderbolt manifest as “fallen off the bus.” Disabling runtime PM prevents the GPU from attempting to enter D3cold.
Then sudo update-initramfs -u.
5. Boot procedure:
Cold boot only. Fully shut down, ensure eGPU is powered on and connected, then power on the laptop. With pre-boot TB (step 1), the BIOS enumerates the GPU at POST.
Hot-plug via egpu-init style scripts remains unreliable in my testing — the rapid remove/rescan/modprobe sequence runs before the Thunderbolt tunnel fully stabilizes.
6. Persistence:
sudo systemctl enable --now nvidia-persistenced
The Ubuntu-packaged unit file lacks an [Install] section; if enable fails, create a drop-in with WantedBy=multi-user.target.
Result
NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1
RTX 5080 | 16303 MiB | P8 | 29°C | 10W / 360W
Reproducible across cold boots.
Credits
-
hvico/Razer-Core-v2-Linux-Fix — baseline configuration (nouveau blacklist,
egpu-initscript, systemd units) -
HPE advisory — the
pci=realloc=offfinding
Stability testing
Tested with PyTorch 2.11.0 + cu130 build:
Test 1 (tiny allocation) — pass
Test 2 (100×100 matmul) — pass
Test 3 (1000×1000 matmul) — pass
Test 4 (10000×10000 matmul, ~800MB) — pass
Sustained load test: 4275 iterations of 8000×8000 float32 matmul over 120 seconds (~36 TFLOPs sustained, ~64% of RTX 5080 peak FP32). No hard-lock, no thermal throttling observed, iteration rate stable throughout.