575 BETA release feedback & discussion

I have not been able to reproduce the issue since game patch 1.2 so the issue may have been resolved. Could it be some issue with Steam’s shader cache?

  • Fixed a bug that could cause Marvel Rivals to crash on startup or when loading levels: solved. I have been playing for around 4 hours, and it didn’t crash on loading levels.

MSI RTX 4070 HDMI port had not been resolved yet. I tested in X11

System:
Host: GrayMalkin Kernel: 6.14.2-1-liquorix-amd64 arch: x86_64 bits: 64
Desktop: Xfce v: 4.20.0 Distro: MX-23.6_ahs_x64 Libretto January 21 2024 (Debian 12 based)
Graphics:
Device-1: NVIDIA AD104 [GeForce RTX 4070] driver: nvidia v: 575.51.02
Display: x11 server: X.Org v: 1.21.1.7 with: Xwayland v: 22.1.9 driver: X:
loaded: nvidia gpu: nvidia resolution: 1920x1080~144Hz
API: OpenGL v: 4.6.0 NVIDIA 575.51.02 renderer: NVIDIA GeForce RTX
4070/PCIe/SSE2

Are there any plans to improve the VRAM management under Linux? It’s much more serious than it ever been. Can crash desktop environment, browsers struggle to run, tabs die with SIGILL, NVENC unable to start (OBS etc.) and all because Nvidia drivers do not handle VRAM as good as in Windows drivers.

5 Likes

Hello Xeeynamo, Can you please let us know what exact display and graphics settings you are using during this Benchmark Run?

I am still having issues with crash to desktop when playing Indiana Jones TGC with FATAL ERROR: vkAcquireNextImageKHR failed with error (VK_TIMEOUT)

100% occurrence in under 20 minutes and usually within 90 seconds of gameplay. Does not happen under X11. Happens regardless of in-game settings. Seems to be avoidable if running very specific combination of gamescope drm backend (embedded session) from tty with cachyos proton experimental and PROTON_ENABLE_WAYLAND=1. Nested gamescope or embedded gamescope with regular proton (using xwayland) get the same error.

I raised an issue with the dxvk-nvapi team who believe it to be a driver/compositor issue.

Link to the issue I raised with dxvk-nvapi:

Original forum post:

steam-2677660.log (3.1 MB)
nvidia-bug-report.log (20.5 MB)
nvapi64.log (29.8 KB)

1 Like

Hi vrachatte, thanks for your answer. My display is a 3840x2160@144 connected on DP. No VRR or HDR.

In-game, the settings are:

  • Display Mode: Borderless
  • Display Resolution: 3840x2160
  • Graphics Preset: Cinematic
  • Super Resolution: 50
  • Full Ray Tracing: Off
  • Super Resolution Sampling: DLSS
  • Frame Generation: Off
  • DX12: On

Please note the freeze occurs in other situations other than games. For example this is what happened during a meeting in Zoom:

NVRM: Xid (PCI:0000:01:00): 109, pid=1314, name=kwin_wayland, Ch 00000003, errorString CTX SWITCH TIMEOUT, Info 0x33c003
kwin_scene_opengl: A graphics reset attributable to the current GL context occurred.
kwin_wayland_drm: Checking test buffer failed!

In short: on 575 the error can be triggered in any situation when graphics acceleration is used. Games are the fastest and more reliable way to trigger the issue.

there’s a graphical bug in The Elder Scrolls IV: Oblivion Remastered (2623190) that is exclusive to the 575 driver series. In some interior areas, there are white speckled dots that appear throughout the environment. It does not seem to be tied to any particular graphics settings either. The location below is “Weynon House”, but other stone building interiors appear to have the same issue.

Renderdoc from my RTX 5090: oblivion-remaster.tar.xz - Google Drive
Proton log: https://github.com/user-attachments/files/19858717/steam-2623190.log
nvidia-bug-report.log.gz (643.6 KB)

Downgrading to 570.123.07 does not have the issue.

System details:

System:
  Host: blackwell Kernel: 6.14.3-2-cachyos arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 6.3.4 Distro: CachyOS
CPU:
  Info: 8-core model: AMD Ryzen 7 9800X3D bits: 64 type: MT MCP cache:
    L2: 8 MiB
  Speed (MHz): avg: 3684 min/max: 603/5272 cores: 1: 3684 2: 3684 3: 3684
    4: 3684 5: 3684 6: 3684 7: 3684 8: 3684 9: 3684 10: 3684 11: 3684 12: 3684
    13: 3684 14: 3684 15: 3684 16: 3684
Graphics:
  Device-1: NVIDIA GB202 [GeForce RTX 5090] driver: nvidia v: 575.51.02
  Display: wayland server: X.org v: 1.21.1.16 with: Xwayland v: 24.1.6
    compositor: kwin_wayland driver: gpu: nvidia,nvidia-nvswitch
    resolution: 5120x2160~165Hz
  API: EGL v: 1.5 drivers: nouveau,nvidia,swrast
    platforms: gbm,wayland,x11,surfaceless,device
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 575.51.02
    renderer: NVIDIA GeForce RTX 5090/PCIe/SSE2
  API: Vulkan v: 1.4.309 drivers: nvidia surfaces: xcb,xlib,wayland
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor gpu: nvidia-settings,nvidia-smi
    wl: wayland-info x11: xdpyinfo, xprop, xrandr

Hi guys (@vrachatte, @abchauhan & @amrits), possible to provide us an answer on this major topic that has, to my knowledge, not yet been publicly answered by your company?


Read the thread first.

See 575 BETA release feedback & discussion - #44 by abchauhan

If you had actually read the thread yourself, you would have seen my reply to the post you quoted.
“VRAM exhaustion” isn’t the issue here. It’s about VRAM management on Linux, period.

I did. You replied to @musabagriyanik, I guess by mistake.

Desperately tagging various forum mods with your MAJOR issue is unlikely to move anything forward.

Where is your 575 Beta bug report and supporting data to help troubleshoot the issue?

If I have a 5090, is it possible to uses the closed source version or I can only use the open source from now on?

Only open

All Blackwell require the open driver now.

Only if it was a bug specific to 575 drivers, not something reported years ago.

This was using Xorg (not wayland) on Ubuntu 25.04, I got this on an Asus Dark Hero Z790 (It had REBAR enabled). This was with the 575-open driver

NVRM: failed to wait for bar firewall to lower

After this, it basically stuck there forever. Had to roll back to 570 open.

And using journactl I was able to see this

Apr 27 16:44:31 xtreme systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Apr 27 16:44:31 xtreme (udev-worker)[3275]: nvidia: Process '/sbin/modprobe nvidia-uvm' failed with exit code 1.
Apr 27 16:44:31 xtreme kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Apr 27 16:44:31 xtreme kernel: NVRM: No NVIDIA devices probed.
Apr 27 16:44:31 xtreme kernel: NVRM: Try unloading the conflicting kernel module (and/or
                               NVRM: reconfigure your kernel without the conflicting
                               NVRM: driver(s)), then try loading the NVIDIA kernel module
                               NVRM: again.
Apr 27 16:44:31 xtreme kernel: NVRM: This can occur when another driver was loaded and 
                               NVRM: obtained ownership of the NVIDIA device(s).
Apr 27 16:44:31 xtreme kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Apr 27 16:44:31 xtreme kernel: nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
Apr 27 16:44:31 xtreme kernel: NVRM: failed to wait for bar firewall to lower
Apr 27 16:44:29 xtreme sudo[4365]: pam_unix(sudo:session): session opened for user root(uid=0) by luis(uid=1000)
Apr 27 16:44:29 xtreme sudo[4365]:     luis : TTY=tty3 ; PWD=/home/luis ; USER=root ; COMMAND=/usr/bin/apt install nvidia-driver-570-open
Apr 27 16:44:27 xtreme kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Apr 27 16:44:27 xtreme (udev-worker)[3275]: nvidia: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
Apr 27 16:44:27 xtreme kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Apr 27 16:44:27 xtreme kernel: NVRM: No NVIDIA devices probed.
Apr 27 16:44:27 xtreme kernel: NVRM: Try unloading the conflicting kernel module (and/or
                               NVRM: reconfigure your kernel without the conflicting
                               NVRM: driver(s)), then try loading the NVIDIA kernel module
                               NVRM: again.
Apr 27 16:44:27 xtreme kernel: NVRM: This can occur when another driver was loaded and 
                               NVRM: obtained ownership of the NVIDIA device(s).
Apr 27 16:44:27 xtreme kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Apr 27 16:44:27 xtreme kernel: nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
Apr 27 16:44:27 xtreme kernel: NVRM: failed to wait for bar firewall to lower
Apr 27 16:44:23 xtreme kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Apr 27 16:44:23 xtreme systemd[1]: Failed to start nvidia-persistenced.service - NVIDIA Persistence Daemon.
Apr 27 16:44:23 xtreme systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Apr 27 16:44:23 xtreme systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Apr 27 16:44:23 xtreme nvidia-persistenced[4319]: Shutdown (4319)
Apr 27 16:44:23 xtreme nvidia-persistenced[4319]: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Apr 27 16:44:23 xtreme nvidia-persistenced[4319]: PID file closed.
Apr 27 16:44:23 xtreme nvidia-persistenced[4314]: nvidia-persistenced failed to initialize. Check syslog for more details.
Apr 27 16:44:23 xtreme nvidia-persistenced[4319]: PID file unlocked.
Apr 27 16:44:23 xtreme nvidia-persistenced[4319]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 106 has read and write permissions for those files.
Apr 27 16:44:23 xtreme (udev-worker)[3275]: nvidia: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
Apr 27 16:44:23 xtreme kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Apr 27 16:44:23 xtreme kernel: NVRM: No NVIDIA devices probed.
Apr 27 16:44:23 xtreme kernel: NVRM: Try unloading the conflicting kernel module (and/or
                               NVRM: reconfigure your kernel without the conflicting
                               NVRM: driver(s)), then try loading the NVIDIA kernel module
                               NVRM: again.
Apr 27 16:44:23 xtreme kernel: NVRM: This can occur when another driver was loaded and 
                               NVRM: obtained ownership of the NVIDIA device(s).
Apr 27 16:44:23 xtreme kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Apr 27 16:44:23 xtreme kernel: nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
Apr 27 16:44:23 xtreme kernel: NVRM: failed to wait for bar firewall to lower
Apr 27 16:44:19 xtreme kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Apr 27 16:44:18 xtreme nvidia-persistenced[4319]: Started (4319)
Apr 27 16:44:18 xtreme nvidia-persistenced[4319]: Now running with user ID 106 and group ID 124
Apr 27 16:44:18 xtreme nvidia-persistenced[4319]: Verbose syslog connection opened
Apr 27 16:44:18 xtreme systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Apr 27 16:44:18 xtreme (udev-worker)[3275]: nvidia: Process '/sbin/modprobe nvidia-uvm' failed with exit code 1.
Apr 27 16:44:18 xtreme kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Apr 27 16:44:18 xtreme kernel: NVRM: No NVIDIA devices probed.
Apr 27 16:44:18 xtreme kernel: NVRM: Try unloading the conflicting kernel module (and/or
                               NVRM: reconfigure your kernel without the conflicting
                               NVRM: driver(s)), then try loading the NVIDIA kernel module
                               NVRM: again.
Apr 27 16:44:18 xtreme kernel: NVRM: This can occur when another driver was loaded and 
                               NVRM: obtained ownership of the NVIDIA device(s).
Apr 27 16:44:18 xtreme kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
Apr 27 16:44:18 xtreme kernel: nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
Apr 27 16:44:18 xtreme kernel: NVRM: failed to wait for bar firewall to lower
Apr 27 16:44:18 xtreme gdm3[3560]: Gdm: Child process -3604 was already dead.
Apr 27 16:44:18 xtreme gdm3[4288]: Timed out while waiting for udev queue to empty.

This is the hardware

System Details Report


Report details

  • Date generated: 2025-04-27 16:53:14

Hardware Information:

  • Hardware Model: ASUS ROG MAXIMUS Z790 DARK HERO
  • Memory: 128.0 GiB
  • Processor: Intel® Core™ i9-14900K × 32
  • Graphics: NVIDIA GeForce RTX™ 5090
  • Disk Capacity: 14.0 TB

Software Information:

  • Firmware Version: 1801
  • OS Name: Ubuntu 25.04
  • OS Build: (null)
  • OS Type: 64-bit
  • GNOME Version: 48
  • Windowing System: X11
  • Kernel Version: Linux 6.14.0-13-generic

Those were all just beta drivers. This time Nvidia will fix all the reported bugs for real. /s

Every release is always the same. A half dozen new bugs get introduced and an employee reluctantly replies to a small amount of bugs and maybe gives tracking numbers. Maybe a few of the reported bugs actually get fixed. Sometimes Nvidia’s employees somehow can’t reproduce bugs when literally every other person in a topic can. People upload bug report zips and it doesn’t seem to ever make a difference. 2-3 months go by and the cycle repeats with more bugs added onto the pile.

But sure, the template is going to help what’s fundamentally a lack of QA process or care.

Fun fact: Nvidia still supports GPUs with 1GB of VRAM. I’ll leave it to everyone’s imagination as to what would happen if you attempted to run a modern DE using it, especially when they decide to GPU accelerate everything for no reason.

2 Likes

Issue on Gnome Wayland. Had PC connected to docking station and all monitors went black. Found these errors in my log:

Apr 28 12:20:15 fedora kernel: [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to map NvKmsKapiMemory 0x00000000c26aa873
Apr 28 12:20:15 fedora kernel: [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to map NvKmsKapiMemory 0x0000000055f3169f

nvidia-bug-report.log.gz (2.0 MB)

This looks heavily like that you did not install the driver/modules properly or not removed the old ones.

If it helps, i removed the old one. Rebooted which ended up being in nouveau. Then I installed 575.