590 release feedback & discussion

I have that as part of the grub command line boot options so it wasn’t an issue for me. I imagine nvidia-all tries to do it via that file customization method instead. I’m not familiar with how it works exactly since those types of settings need to be set extremely early to work AFAIK.

Arch applies a patch to the kernel driver. nvidia-all appears to apply that same patch to all versions 570.xx and newer. It doesn’t appear to be working.

I chose instead to patch Arch’s packages. I only had to eliminate two patches from the process, as both patches are obsoleted by the 590.xx version. I also had to eliminate the packaging of the nvidia-dkmscomponent, as the package no longer supports it.

It looks like the error you may be encountering is because it isn’t regenerating the package script correctly. I have not attempted to use 590.44.01 with nvidia-all using the custom option, but instead through modifying the PKGBUILD to add it as its own default category, with the correct MD5 hash for the driver download.

Unfortunately, the regression for Monster Hunter Wilds with RTX 50xx (bug 5547446) already reported for the 580 branch is still present. The game freezes after compiling the shaders, right before displaying the main menu. Going back to driver 575 is the only way to make it work.

Here are the logs, just in case it provides new informations:

nvidia-bug-report.log.gz (962.5 KB)

Bug 5507242 (first reported here: 580 release feedback & discussion - #248 by airlinese ) is still a issue on 590.44.01.

In the AUR there are also nvidia-open-beta-dkms and nvidia-beta-dkms packages, along with nvidia-utils-beta. This is what I use when I do not recompile myself from the nvidia-utils PKGBUILD and this works fine.

With 590.44.01 I still have the issue of waking up an LG 34GS95QE monitor at 240 Hz with HDR and VRR on (using KDE/Wayland on Archlinux). At 144 Hz it wakes up without any problem.

nvidia-bug-report.log.gz (2.0 MB)

Halo infinite is still having really bad performance issues.

used proton experimental.

i am getting only 65-70fps maxed out on a 5070ti where is used to be 200fps maxed out same settings.

1 Like

I sent the following to NVIDIA and Sonnet support, but posting here too in case anyone is running into similar problems with the RTX 5080 on Linux using the official drivers from the CUDA rhel10 repo, both 580 and 590.

Summary

RTX 5080 connected via Thunderbolt 5 eGPU enclosure works at idle (nvidia-smi functional) but any CUDA operation causes immediate system hard-lock requiring power cycle. This appears related to GitHub open-gpu-kernel-modules issue #900 (Blackwell GPU over external PCIe).

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/900

Hardware

Component Details
GPU NVIDIA GeForce RTX 5080 (GB203)
eGPU Enclosure Sonnet Breakaway Box 850T5 (Thunderbolt 5)
Host Lenovo ThinkPad X1 Carbon Gen 11
CPU Intel Core i7-1355U
BIOS N3XET62W (1.37)
Thunderbolt Controller Intel Raptor Lake-P Thunderbolt 4
OS Rocky Linux 10.1 Workstation (clean install)
Kernel 6.12.0-124.13.1.el10_1.x86_64 (PREEMPT_DYNAMIC)

Driver

  • Version: 590.44.01

  • Source: Official CUDA RHEL10 repository

  • Type: Open kernel modules (kmod-nvidia-open-dkms)

PCIe Link Status

LnkCap: Port #0, Speed 32GT/s, Width x16
LnkSta: Speed 16GT/s (downgraded), Width x4 (downgraded)

Thunderbolt link: 40 Gb/s (2 lanes × 20 Gb/s)

Symptoms

  1. GPU detected on PCIe bus at boot

  2. nvidia-smi reports GPU correctly and shows idle state (2W, 30°C)

  3. Any CUDA operation causes immediate system hard-lock

Minimal Reproducer

# Works - GPU visible and responsive at idle
nvidia-smi

# Hard lock - system freezes immediately, requires power cycle
python3 -c "import torch; x = torch.zeros(1, device='cuda'); print(x)"

System freezes completely - no kernel panic, no Xid error logged, no SysRq response. Requires power cycle to recover.

Required Configuration

Kernel Parameters

pcie_aspm=off
pcie_ports=native
pcie_port_pm=off
intel_iommu=off
pci=assign-busses,realloc,hpbussize=0x33,hpmmiosize=768M,hpmmioprefsize=16G
rd.driver.blacklist=nouveau
rd.driver.blacklist=nova-core

BIOS Settings

  • Kernel DMA Protection: Disabled (required - with it enabled, BARs fail to allocate)

  • Thunderbolt PCIe Tunneling: Enabled

  • Secure Boot: Disabled

Modprobe Configuration

/etc/modprobe.d/nvidia-pm.conf:

options nvidia NVreg_DynamicPowerManagement=0x00

Udev Rules

/etc/udev/rules.d/99-nvidia-no-d3cold.rules:

ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", ATTR{power/control}="on", ATTR{d3cold_allowed}="0"

Issues Encountered During Debugging

Issue Details
Without pcie_ports=native GPU enters D3cold, driver fails with “Unable to change power state from D3cold to D0”
With Kernel DMA Protection enabled PCIe tunnel limited to 2.5GT/s x4, BAR allocation fails
BAR allocation Requires hotplug resource reservation parameters
Driver probe GPU periodically shows “fallen off the bus” during probe attempts

dmesg at Boot (Successful Driver Load)

nvidia: loading out-of-tree module taints kernel.
nvidia-nvlink: Nvlink Core is being initialized, major device number 511
NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 590.44.01

Relation to Issue #900

Issue #900 documents identical symptoms with RTX 5090 over OCuLink (external PCIe):

  • nvidia-smi works at idle

  • Computational load causes GPU to disconnect/system to crash

  • GSP firmware bootstrap errors noted during driver loading

Both involve Blackwell GPUs over external PCIe interfaces (Thunderbolt in my case, OCuLink in #900). The common factor appears to be Blackwell architecture over non-native PCIe connections.

Attachment

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz (1.4 MB)

2 Likes

Silent Hill f hangs when the Rinko boss fight begins. After the walk with the Fox god cutscene, Rinko starts her first attack, the screen turns red and the game freezes, with music playing in the background. In /var/log/syslog, I can see this message:

kernel: NVRM: Xid (PCI:0000:01:00): 109, pid=70102, name=GameThread, channel 0x00000014, errorString CTX SWITCH TIMEOUT, Info 0x31c030

Kubuntu 24.04, X11 (not Wayland), Proton-GE 10-25, RTX 5080, Nvidia driver 590.44.01.

nvidia-bug-report.log.gz (1.8 MB)

Here’s the game save, gzip-compressed, if you need it:

SaveSlot3.sav.gz (316.2 KB)

Just walk up to the Fox god, watch the cutscene and start the boss fight. The game hangs on the first attack. Reproduces 100% of time for me.

PS: The game worked fine and never crashed or hung on me for the entire playthrough up to this specific moment.

PPS: This didn’t happen with driver 580.105.08.

2 Likes

It broke Black Myth: Wukong benchmark. It successfully builds the shaders, but it locks up the GPU on a black screen when trying to start the benchmark.

nvidia-bug-report.log.gz (1.8 MB)

[  268.019756] NVRM: Xid (PCI:0000:01:00): 109, pid=6388, name=GameThread, channel 0x0000002d, errorString CTX SWITCH TIMEOUT, Info 0x17c04c
1 Like

glxinfo is exiting with the following message on fedora kde 43 wayland session:

X Error of failed request:  BadWindow (invalid Window parameter)
Major opcode of failed request:  146 ()
Minor opcode of failed request:  5
Resource id in failed request:  0x1000003
Serial number of failed request:  56
Current serial number in output stream:  57
❯ xdpyinfo -queryExt | grep opcode

BIG-REQUESTS  (opcode: 133)
Composite  (opcode: 142)
DAMAGE  (opcode: 143, base event: 91, base error: 152)
DOUBLE-BUFFER  (opcode: 144, base error: 153)
DRI3  (opcode: 147)
GLX  (opcode: 150, base event: 94, base error: 158)
Generic Event Extension  (opcode: 128)
MIT-SHM  (opcode: 130, base event: 65, base error: 128)
Present  (opcode: 146)
RANDR  (opcode: 140, base event: 89, base error: 147)
RECORD  (opcode: 145, base error: 154)
RENDER  (opcode: 139, base error: 142)
SECURITY  (opcode: 137, base event: 86, base error: 138)
SHAPE  (opcode: 129, base event: 64)
SYNC  (opcode: 134, base event: 83, base error: 134)
X-Resource  (opcode: 148)
XC-MISC  (opcode: 136)
XFIXES  (opcode: 138, base event: 87, base error: 140)
XFree86-VidModeExtension  (opcode: 151, base error: 172)
XINERAMA  (opcode: 141)
XInputExtension  (opcode: 131, base event: 66, base error: 129)
XKEYBOARD  (opcode: 135, base event: 85, base error: 137)
XTEST  (opcode: 132)
XVideo  (opcode: 149, base event: 92, base error: 155)
XWAYLAND  (opcode: 152)
xorg-x11-server-Xwayland-24.1.9-1.fc43.x86_64
glx-utils-9.0.0-10.fc43.x86_64
glx-utils-9.0.0-10.fc43.i686

I’m using the driver from the cuda-fedora42 repo
nvidia-bug-report.log.gz (1.3 MB)
maybe it’s an off by 1 error? dix/protocol.txt · master · xorg / xserver · GitLab

1 Like

Here’s another one. Borderlands 4 on the 590.44.01 driver. GE Proton, ENABLE_HDR_WSI=1 and the Wayland/HDR modes enabled.

Basically, the edge detection shader(s) are glitching on distant geometry.

590 has just arrived to DC apt repos including Debian-13 :)))
I’ve just installed it on a test machine and performed some smoke tests:

  • vkcube runs ok on my 3090 eGPU
  • ollama is able to offload to the 3090 eGPU as well

There seems to be some problem with DXVK-NVAPI however: my Nvidia card is not detected, Wine falls back to the iGPU and I get the following errors in logs:

NVRM: API mismatch: the client 'Agent.exe' (pid 9590)
NVRM: has the version 580.105.08, but this kernel module has
NVRM: the version 590.44.01.  Please make sure that this
NVRM: kernel module and all NVIDIA driver components
NVRM: have the same version.

I’ve purged '*nvidia*' '*nvidia*:i386' '*cuda*' '*cuda*:i386' and installed nvidia-open from scratch again, but it has not helped. Has anyone experienced something similar with v590 either on DC packages or on dot-run installer?

UPDATE: the DXVK-NVAPI thing turned out to be my own misconfiguration ;-]

have not experienced that on the fedora dc repo

Hey can confirm this bug, did a PR should be fix this cause. for me 580 run without modeset kernel parameter now:

KISS

Patch and write-up posted at (gh-979) Thunderbolt 4/5 and USB4 eGPU Support by roger-pmta · Pull Request #981 · NVIDIA/open-gpu-kernel-modules · GitHub. Will also send it to linux-bugs.

D3cold & 590

Debian 13, GeForce RTX 4070 Laptop GPU, gnome 48, Wayland, kernel 6.12.57+deb13-amd64.

With the previous versions (580.105.08, 580,95.05) the GPU enters into D3cold power state as soon as the GPU is not used, I should mention I use the “Freon” gnome-extension which monitors temperatures, including the Nvidia GPU, when the GPU enters the D3cold power state, the “Freon” extension shows the temp as “N/A”, until I run a program that uses the GPU. I mean, the GPU stays in D3cold, even when “Freon” tries to read the temperature, which is desirable.

With the 590 version, the GPU enters into D3cold power state, but it looks like every time “Freon” reads the temperature, it “wakes up” the GPU.

I was trying to find out if there was a relevant difference between the parameters / configurations when using the 580 and the 590 versions, but I couldn’t find anything.

I wonder if there is a parameter or configuration setting that defines the way in which the GPU enters into the D3Cold power state.

Although I read the NVreg_DynamicPowerManagement=0x03 is the default, I tried setting its value (/etc/modprobe.d/nvidia.conf) to 0x03 and 0x02, but it didn’t make any difference in the behavior between both versions.

Information I think is relevant:

580 and 590

/sys/bus/pci/devices/0000:00:01.0/power/

async					enabled
autosuspend_delay_ms	100
control					auto

/sys/bus/pci/devices/0000:00:01.0/
d3cold_allowed			1
power_state				D3cold (590 it switches to D0 every time the temp is read)
revision				0x02
/proc/driver/nvidia/gpus/0000:01:00.0/power

580

Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Off
/proc/driver/nvidia/gpus/0000\:01\:00.0/power

GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off:          Supported

S0ix Power Management:
Platform Support:          Supported
Status:                    Enabled

Notebook Dynamic Boost:     Supported



590

Runtime D3 status:          Enabled (fine-grained)
Tegra iGPU Rail-Gating:     Disabled
Video Memory:               Off

GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off:          Supported

S0ix Power Management:
Platform Support:          Supported
Status:                    Enabled

Notebook Dynamic Boost:     Supported
/proc/driver/nvidia/params

580 & 590

ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 1
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableResizableBar: 0
EnableGpuFirmware: 18
EnableGpuFirmwareLogs: 2
RmNvlinkBandwidthLinkCount: 0
EnableDbgBreakpoint: 0
OpenRmEnableUnsupportedGpus: 1
DmaRemapPeerMmio: 1
ImexChannelCount: 2048
CreateImexChannel0: 0
GrdmaPciTopoCheckOverride: 0
CoherentGPUMemoryMode: “”
RegistryDwords: “”
RegistryDwordsPerDevice: “”
RmMsg: “”
GpuBlacklist: “”
TemporaryFilePath: “/var/tmp”
ExcludedGpus: “”

Only 590:

TegraGpuPgMask: 0
EnableSystemMemoryPools: 529
/proc/driver/nvidia/gpus/0000:01:00.0/information

Model: 		 NVIDIA GeForce RTX 4070 Laptop GPU
IRQ:   		 235
GPU UUID: 	 GPU-e950e2a3-8ed8-9b59-4db0-ec5857dadf63
Video BIOS: 	 95.06.15.40.36
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:01:00.0
Device Minor: 	 0
GPU Firmware: 	 590.44.01 or 580.105.08
GPU Excluded:	 No

No extra configuration modified between versions, just upgraded to 590, then “downgraded” to 580.

/etc/modprobe.d/

nvidia.conf

options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1

nvidia-modeset.conf

options nvidia-drm modeset=1


Services:

nvidia-hibernate.service              disabled
nvidia-powerd.service                 enabled / active (running)
nvidia-suspend.service                enabled
nvidia-persistenced.service           enabled / active (running)
nvidia-resume.service                 enabled
nvidia-suspend-then-hibernate.service disabled
1 Like

Have you had issues getting the 590 open source to work? I think the proprietary does but if you use the open source version it breaks.

I’m assuming the answer is no, but is VRR fixed in this version?

Are you running Flatpak Steam? You’ll need to Flatpak update to get the matching userspace packages for your new drivers.

2 Likes