I have that as part of the grub command line boot options so it wasn’t an issue for me. I imagine nvidia-all tries to do it via that file customization method instead. I’m not familiar with how it works exactly since those types of settings need to be set extremely early to work AFAIK.
Arch applies a patch to the kernel driver. nvidia-all appears to apply that same patch to all versions 570.xx and newer. It doesn’t appear to be working.
I chose instead to patch Arch’s packages. I only had to eliminate two patches from the process, as both patches are obsoleted by the 590.xx version. I also had to eliminate the packaging of the nvidia-dkmscomponent, as the package no longer supports it.
It looks like the error you may be encountering is because it isn’t regenerating the package script correctly. I have not attempted to use 590.44.01 with nvidia-all using the custom option, but instead through modifying the PKGBUILD to add it as its own default category, with the correct MD5 hash for the driver download.
Unfortunately, the regression for Monster Hunter Wilds with RTX 50xx (bug 5547446) already reported for the 580 branch is still present. The game freezes after compiling the shaders, right before displaying the main menu. Going back to driver 575 is the only way to make it work.
Here are the logs, just in case it provides new informations:
nvidia-bug-report.log.gz (962.5 KB)
Bug 5507242 (first reported here: 580 release feedback & discussion - #248 by airlinese ) is still a issue on 590.44.01.
In the AUR there are also nvidia-open-beta-dkms and nvidia-beta-dkms packages, along with nvidia-utils-beta. This is what I use when I do not recompile myself from the nvidia-utils PKGBUILD and this works fine.
With 590.44.01 I still have the issue of waking up an LG 34GS95QE monitor at 240 Hz with HDR and VRR on (using KDE/Wayland on Archlinux). At 144 Hz it wakes up without any problem.
nvidia-bug-report.log.gz (2.0 MB)
Halo infinite is still having really bad performance issues.
used proton experimental.
i am getting only 65-70fps maxed out on a 5070ti where is used to be 200fps maxed out same settings.
I sent the following to NVIDIA and Sonnet support, but posting here too in case anyone is running into similar problems with the RTX 5080 on Linux using the official drivers from the CUDA rhel10 repo, both 580 and 590.
Summary
RTX 5080 connected via Thunderbolt 5 eGPU enclosure works at idle (nvidia-smi functional) but any CUDA operation causes immediate system hard-lock requiring power cycle. This appears related to GitHub open-gpu-kernel-modules issue #900 (Blackwell GPU over external PCIe).
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/900
Hardware
| Component | Details |
|---|---|
| GPU | NVIDIA GeForce RTX 5080 (GB203) |
| eGPU Enclosure | Sonnet Breakaway Box 850T5 (Thunderbolt 5) |
| Host | Lenovo ThinkPad X1 Carbon Gen 11 |
| CPU | Intel Core i7-1355U |
| BIOS | N3XET62W (1.37) |
| Thunderbolt Controller | Intel Raptor Lake-P Thunderbolt 4 |
| OS | Rocky Linux 10.1 Workstation (clean install) |
| Kernel | 6.12.0-124.13.1.el10_1.x86_64 (PREEMPT_DYNAMIC) |
Driver
-
Version: 590.44.01
-
Source: Official CUDA RHEL10 repository
-
Type: Open kernel modules (kmod-nvidia-open-dkms)
PCIe Link Status
LnkCap: Port #0, Speed 32GT/s, Width x16
LnkSta: Speed 16GT/s (downgraded), Width x4 (downgraded)
Thunderbolt link: 40 Gb/s (2 lanes × 20 Gb/s)
Symptoms
-
GPU detected on PCIe bus at boot
-
nvidia-smi reports GPU correctly and shows idle state (2W, 30°C)
-
Any CUDA operation causes immediate system hard-lock
Minimal Reproducer
# Works - GPU visible and responsive at idle
nvidia-smi
# Hard lock - system freezes immediately, requires power cycle
python3 -c "import torch; x = torch.zeros(1, device='cuda'); print(x)"
System freezes completely - no kernel panic, no Xid error logged, no SysRq response. Requires power cycle to recover.
Required Configuration
Kernel Parameters
pcie_aspm=off
pcie_ports=native
pcie_port_pm=off
intel_iommu=off
pci=assign-busses,realloc,hpbussize=0x33,hpmmiosize=768M,hpmmioprefsize=16G
rd.driver.blacklist=nouveau
rd.driver.blacklist=nova-core
BIOS Settings
-
Kernel DMA Protection: Disabled (required - with it enabled, BARs fail to allocate)
-
Thunderbolt PCIe Tunneling: Enabled
-
Secure Boot: Disabled
Modprobe Configuration
/etc/modprobe.d/nvidia-pm.conf:
options nvidia NVreg_DynamicPowerManagement=0x00
Udev Rules
/etc/udev/rules.d/99-nvidia-no-d3cold.rules:
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/control}="on", ATTR{d3cold_allowed}="0"
Issues Encountered During Debugging
| Issue | Details |
|---|---|
Without pcie_ports=native |
GPU enters D3cold, driver fails with “Unable to change power state from D3cold to D0” |
| With Kernel DMA Protection enabled | PCIe tunnel limited to 2.5GT/s x4, BAR allocation fails |
| BAR allocation | Requires hotplug resource reservation parameters |
| Driver probe | GPU periodically shows “fallen off the bus” during probe attempts |
dmesg at Boot (Successful Driver Load)
nvidia: loading out-of-tree module taints kernel.
nvidia-nvlink: Nvlink Core is being initialized, major device number 511
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 590.44.01
Relation to Issue #900
Issue #900 documents identical symptoms with RTX 5090 over OCuLink (external PCIe):
-
nvidia-smi works at idle
-
Computational load causes GPU to disconnect/system to crash
-
GSP firmware bootstrap errors noted during driver loading
Both involve Blackwell GPUs over external PCIe interfaces (Thunderbolt in my case, OCuLink in #900). The common factor appears to be Blackwell architecture over non-native PCIe connections.
Attachment
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz (1.4 MB)
Silent Hill f hangs when the Rinko boss fight begins. After the walk with the Fox god cutscene, Rinko starts her first attack, the screen turns red and the game freezes, with music playing in the background. In /var/log/syslog, I can see this message:
kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=70102, name=GameThread, channel 0x00000014, errorString CTX SWITCH TIMEOUT, Info 0x31c030
Kubuntu 24.04, X11 (not Wayland), Proton-GE 10-25, RTX 5080, Nvidia driver 590.44.01.
nvidia-bug-report.log.gz (1.8 MB)
Here’s the game save, gzip-compressed, if you need it:
SaveSlot3.sav.gz (316.2 KB)
Just walk up to the Fox god, watch the cutscene and start the boss fight. The game hangs on the first attack. Reproduces 100% of time for me.
PS: The game worked fine and never crashed or hung on me for the entire playthrough up to this specific moment.
PPS: This didn’t happen with driver 580.105.08.
It broke Black Myth: Wukong benchmark. It successfully builds the shaders, but it locks up the GPU on a black screen when trying to start the benchmark.
nvidia-bug-report.log.gz (1.8 MB)
[ 268.019756] NVRM: Xid (PCI:0000:01:00): 109, pid=6388, name=GameThread, channel 0x0000002d, errorString CTX SWITCH TIMEOUT, Info 0x17c04c
glxinfo is exiting with the following message on fedora kde 43 wayland session:
X Error of failed request: BadWindow (invalid Window parameter)
Major opcode of failed request: 146 ()
Minor opcode of failed request: 5
Resource id in failed request: 0x1000003
Serial number of failed request: 56
Current serial number in output stream: 57
❯ xdpyinfo -queryExt | grep opcode
BIG-REQUESTS (opcode: 133)
Composite (opcode: 142)
DAMAGE (opcode: 143, base event: 91, base error: 152)
DOUBLE-BUFFER (opcode: 144, base error: 153)
DRI3 (opcode: 147)
GLX (opcode: 150, base event: 94, base error: 158)
Generic Event Extension (opcode: 128)
MIT-SHM (opcode: 130, base event: 65, base error: 128)
Present (opcode: 146)
RANDR (opcode: 140, base event: 89, base error: 147)
RECORD (opcode: 145, base error: 154)
RENDER (opcode: 139, base error: 142)
SECURITY (opcode: 137, base event: 86, base error: 138)
SHAPE (opcode: 129, base event: 64)
SYNC (opcode: 134, base event: 83, base error: 134)
X-Resource (opcode: 148)
XC-MISC (opcode: 136)
XFIXES (opcode: 138, base event: 87, base error: 140)
XFree86-VidModeExtension (opcode: 151, base error: 172)
XINERAMA (opcode: 141)
XInputExtension (opcode: 131, base event: 66, base error: 129)
XKEYBOARD (opcode: 135, base event: 85, base error: 137)
XTEST (opcode: 132)
XVideo (opcode: 149, base event: 92, base error: 155)
XWAYLAND (opcode: 152)
xorg-x11-server-Xwayland-24.1.9-1.fc43.x86_64
glx-utils-9.0.0-10.fc43.x86_64
glx-utils-9.0.0-10.fc43.i686
I’m using the driver from the cuda-fedora42 repo
nvidia-bug-report.log.gz (1.3 MB)
maybe it’s an off by 1 error? dix/protocol.txt · master · xorg / xserver · GitLab
Here’s another one. Borderlands 4 on the 590.44.01 driver. GE Proton, ENABLE_HDR_WSI=1 and the Wayland/HDR modes enabled.
Basically, the edge detection shader(s) are glitching on distant geometry.
590 has just arrived to DC apt repos including Debian-13 :)))
I’ve just installed it on a test machine and performed some smoke tests:
vkcuberuns ok on my 3090 eGPUollamais able to offload to the 3090 eGPU as well
There seems to be some problem with DXVK-NVAPI however: my Nvidia card is not detected, Wine falls back to the iGPU and I get the following errors in logs:
NVRM: API mismatch: the client 'Agent.exe' (pid 9590)
NVRM: has the version 580.105.08, but this kernel module has
NVRM: the version 590.44.01. Please make sure that this
NVRM: kernel module and all NVIDIA driver components
NVRM: have the same version.
I’ve purged '*nvidia*' '*nvidia*:i386' '*cuda*' '*cuda*:i386' and installed nvidia-open from scratch again, but it has not helped. Has anyone experienced something similar with v590 either on DC packages or on dot-run installer?
UPDATE: the DXVK-NVAPI thing turned out to be my own misconfiguration ;-]
have not experienced that on the fedora dc repo
Hey can confirm this bug, did a PR should be fix this cause. for me 580 run without modeset kernel parameter now:
KISS
Patch and write-up posted at (gh-979) Thunderbolt 4/5 and USB4 eGPU Support by roger-pmta · Pull Request #981 · NVIDIA/open-gpu-kernel-modules · GitHub. Will also send it to linux-bugs.
D3cold & 590
Debian 13, GeForce RTX 4070 Laptop GPU, gnome 48, Wayland, kernel 6.12.57+deb13-amd64.
With the previous versions (580.105.08, 580,95.05) the GPU enters into D3cold power state as soon as the GPU is not used, I should mention I use the “Freon” gnome-extension which monitors temperatures, including the Nvidia GPU, when the GPU enters the D3cold power state, the “Freon” extension shows the temp as “N/A”, until I run a program that uses the GPU. I mean, the GPU stays in D3cold, even when “Freon” tries to read the temperature, which is desirable.
With the 590 version, the GPU enters into D3cold power state, but it looks like every time “Freon” reads the temperature, it “wakes up” the GPU.
I was trying to find out if there was a relevant difference between the parameters / configurations when using the 580 and the 590 versions, but I couldn’t find anything.
I wonder if there is a parameter or configuration setting that defines the way in which the GPU enters into the D3Cold power state.
Although I read the NVreg_DynamicPowerManagement=0x03 is the default, I tried setting its value (/etc/modprobe.d/nvidia.conf) to 0x03 and 0x02, but it didn’t make any difference in the behavior between both versions.
Information I think is relevant:
580 and 590
/sys/bus/pci/devices/0000:00:01.0/power/
async enabled
autosuspend_delay_ms 100
control auto
/sys/bus/pci/devices/0000:00:01.0/
d3cold_allowed 1
power_state D3cold (590 it switches to D0 every time the temp is read)
revision 0x02
/proc/driver/nvidia/gpus/0000:01:00.0/power
580
Runtime D3 status: Enabled (fine-grained)
Video Memory: Off
/proc/driver/nvidia/gpus/0000\:01\:00.0/power
GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Supported
Status: Enabled
Notebook Dynamic Boost: Supported
590
Runtime D3 status: Enabled (fine-grained)
Tegra iGPU Rail-Gating: Disabled
Video Memory: Off
GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Supported
Status: Enabled
Notebook Dynamic Boost: Supported
/proc/driver/nvidia/params
580 & 590
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 1
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableResizableBar: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
RmNvlinkBandwidthLinkCount: 0
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 1
DmaRemapPeerMmio: 1
ImexChannelCount: 2048
CreateImexChannel0: 0
GrdmaPciTopoCheckOverride: 0
CoherentGPUMemoryMode: “”
RegistryDwords: “”
RegistryDwordsPerDevice: “”
RmMsg: “”
GpuBlacklist: “”
TemporaryFilePath: “/var/tmp”
ExcludedGpus: “”
Only 590:
TegraGpuPgMask: 0
EnableSystemMemoryPools: 529
/proc/driver/nvidia/gpus/0000:01:00.0/information
Model: NVIDIA GeForce RTX 4070 Laptop GPU
IRQ: 235
GPU UUID: GPU-e950e2a3-8ed8-9b59-4db0-ec5857dadf63
Video BIOS: 95.06.15.40.36
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
GPU Firmware: 590.44.01 or 580.105.08
GPU Excluded: No
No extra configuration modified between versions, just upgraded to 590, then “downgraded” to 580.
/etc/modprobe.d/
nvidia.conf
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1
nvidia-modeset.conf
options nvidia-drm modeset=1
Services:
nvidia-hibernate.service disabled
nvidia-powerd.service enabled / active (running)
nvidia-suspend.service enabled
nvidia-persistenced.service enabled / active (running)
nvidia-resume.service enabled
nvidia-suspend-then-hibernate.service disabled
Have you had issues getting the 590 open source to work? I think the proprietary does but if you use the open source version it breaks.
I’m assuming the answer is no, but is VRR fixed in this version?
Are you running Flatpak Steam? You’ll need to Flatpak update to get the matching userspace packages for your new drivers.