590 release feedback & discussion

nv.devacct · December 4, 2025, 6:41pm

I sent the following to NVIDIA and Sonnet support, but posting here too in case anyone is running into similar problems with the RTX 5080 on Linux using the official drivers from the CUDA rhel10 repo, both 580 and 590.

Summary

RTX 5080 connected via Thunderbolt 5 eGPU enclosure works at idle (nvidia-smi functional) but any CUDA operation causes immediate system hard-lock requiring power cycle. This appears related to GitHub open-gpu-kernel-modules issue #900 (Blackwell GPU over external PCIe).

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/900

Hardware

Component	Details
GPU	NVIDIA GeForce RTX 5080 (GB203)
eGPU Enclosure	Sonnet Breakaway Box 850T5 (Thunderbolt 5)
Host	Lenovo ThinkPad X1 Carbon Gen 11
CPU	Intel Core i7-1355U
BIOS	N3XET62W (1.37)
Thunderbolt Controller	Intel Raptor Lake-P Thunderbolt 4
OS	Rocky Linux 10.1 Workstation (clean install)
Kernel	6.12.0-124.13.1.el10_1.x86_64 (PREEMPT_DYNAMIC)

Driver

Version: 590.44.01
Source: Official CUDA RHEL10 repository
Type: Open kernel modules (kmod-nvidia-open-dkms)

PCIe Link Status

LnkCap: Port #0, Speed 32GT/s, Width x16
LnkSta: Speed 16GT/s (downgraded), Width x4 (downgraded)

Thunderbolt link: 40 Gb/s (2 lanes × 20 Gb/s)

Symptoms

GPU detected on PCIe bus at boot
nvidia-smi reports GPU correctly and shows idle state (2W, 30°C)
Any CUDA operation causes immediate system hard-lock

Minimal Reproducer

# Works - GPU visible and responsive at idle
nvidia-smi

# Hard lock - system freezes immediately, requires power cycle
python3 -c "import torch; x = torch.zeros(1, device='cuda'); print(x)"

System freezes completely - no kernel panic, no Xid error logged, no SysRq response. Requires power cycle to recover.

Required Configuration

Kernel Parameters

pcie_aspm=off
pcie_ports=native
pcie_port_pm=off
intel_iommu=off
pci=assign-busses,realloc,hpbussize=0x33,hpmmiosize=768M,hpmmioprefsize=16G
rd.driver.blacklist=nouveau
rd.driver.blacklist=nova-core

BIOS Settings

Kernel DMA Protection: Disabled (required - with it enabled, BARs fail to allocate)
Thunderbolt PCIe Tunneling: Enabled
Secure Boot: Disabled

Modprobe Configuration

/etc/modprobe.d/nvidia-pm.conf:

options nvidia NVreg_DynamicPowerManagement=0x00

Udev Rules

/etc/udev/rules.d/99-nvidia-no-d3cold.rules:

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/control}="on", ATTR{d3cold_allowed}="0"

Issues Encountered During Debugging

Issue	Details
Without `pcie_ports=native`	GPU enters D3cold, driver fails with “Unable to change power state from D3cold to D0”
With Kernel DMA Protection enabled	PCIe tunnel limited to 2.5GT/s x4, BAR allocation fails
BAR allocation	Requires hotplug resource reservation parameters
Driver probe	GPU periodically shows “fallen off the bus” during probe attempts

dmesg at Boot (Successful Driver Load)

nvidia: loading out-of-tree module taints kernel.
nvidia-nvlink: Nvlink Core is being initialized, major device number 511
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 590.44.01

Relation to Issue #900

Issue #900 documents identical symptoms with RTX 5090 over OCuLink (external PCIe):

nvidia-smi works at idle
Computational load causes GPU to disconnect/system to crash
GSP firmware bootstrap errors noted during driver loading

Both involve Blackwell GPUs over external PCIe interfaces (Thunderbolt in my case, OCuLink in #900). The common factor appears to be Blackwell architecture over non-native PCIe connections.

Attachment

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz (1.4 MB)

Topic		Replies	Views
570 release feedback & discussion Linux	657	50936	October 11, 2025
575 release feedback & discussion Linux	501	43910	February 5, 2026
555 release feedback & discussion Linux	277	47260	February 3, 2025
580 release feedback & discussion Linux	969	64075	February 7, 2026
560 release feedback & discussion Linux	565	50354	November 18, 2025
565 release feedback & discussion Linux	448	39460	November 10, 2025
550.78 release feedback & discussion thread Linux	22	5078	June 26, 2024
Summary of existing issues with the nVidia proprietary driver Linux nvbugs , linux-driver	18	3816	February 4, 2026
Flickering at the top of the screen Linux	154	30399	November 14, 2023
[Various/all Distros] Numerous Performance & Rendering Issues on 390.25 Linux	155	43689	October 6, 2018