560 release feedback & discussion

Before nvidia, I had an amd rx 5700, there were no problems. I remember that 7 years ago there were big problems on a laptop with intel+nvidia(nvidia prime). I took an rtx 3090 to use for developing and running AI models. How surprised I was that after 7 years of the same problems, it is impossible to work normally on nvidia. KDE is very laggy with rtx 3090, even with gsp off, no matter wayland or X11. Gnome wayland animations are smoother, but sometimes the screen just goes down, although there is nothing in the system logs. In gnome x11, again skipping frames. I am very disappointed that I canā€™t work properly with nvidia and linux. Everything works fine on windows. Why canā€™t you provide decent support and at least fix bugs, because even top-end video cards have problems? On my old laptop with an intel HD4000, everything works much better and without lags than on the rtx 3090. This is just beyond my comprehension.

2 Likes

@fedev Yesā€¦ I have seen similar issues as well using 560.35.03. In my case, I was mainly testing Skyrim, Witcher 3 and Fallout 4 via Steam using Proton 8 and 9. For Skyrim and Witcher 3 the game would freeze very easily, but only if it was started on my external monitor. For Fallout 4, it would always freeze on the external monitor but also sometimes on the internal monitor. I eventually downgraded to 550.90.07 and that ā€œfixedā€ the freezing issues. If I can find the time, I may try some of the 555/560 releases to see if I can pin down when the problems started. But Iā€™m not sure how much help that will be.

Well, wayland support has improved a lot. But there are someā€¦ quite annoying issues. The wayland improvements are quite new and so is the use of the open src driver, so I hope they work it out.

550 driver on the other hand, is actually quite ok in X11 at least when it comes to desktop use.

in 560.35.03 running xrandr --output DVI-D-0 --off --output HDMI-0 --mode 1366x768 --pos 0x0 --rotate normal --output DP-1 --primary --mode 1920x1080 --pos 0x768 --rotate normal works
running xrandr --output DVI-D-0 --mode 1920x1080 --pos 0x768 --rotate normal --output HDMI-0 --mode 1366x768 --pos 0x0 --rotate normal --output DP-1 --primary --mode 1920x1080 --pos 0x768 --rotate normal freezes xorg, dvi is still broken

in 550.107.02 both command works, dvi is working

Issue just got more severe, now all GNOME (gtk4) apps are bricked by default. For example see: Gdk-Message: Error 71 (Protocol error) dispatching to Wayland display - Fedora Discussion
For me itā€™s reproducible for all GTK4 apps (non-Flatpak).
Workaround is to set GSK_RENDERER=gl env variable

This is because:

GTK 4.16 defaults to using the Vulkan GSK renderer on Wayland,

EDIT: To reproduce run for example easyeffects or pamac -manager with WAYLAND_DEBUG=1.
Then the error is:

[3358526.822] {Display Queue} wl_display#1.error(wp_linux_drm_syncobj_manager_v1#58, 0, "surface already exists")
2 Likes

Distribution (run cat /etc/os-release):
NAME=ā€œPop!_OSā€
VERSION=ā€œ22.04 LTSā€
ID=pop
ID_LIKE=ā€œubuntu debianā€
PRETTY_NAME=ā€œPop!_OS 22.04 LTSā€
VERSION_ID=ā€œ22.04ā€
HOME_URL=ā€œhttps://pop.system76.comā€
SUPPORT_URL=ā€œhttps://support.system76.comā€
BUG_REPORT_URL=ā€œIssues Ā· pop-os/pop Ā· GitHubā€
PRIVACY_POLICY_URL=ā€œSystem76 - Linux Laptops, Desktops, and Serversā€
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

Related Application and/or Package Version (run apt policy $PACKAGE NAME):
Lutris, both versions, Flatpakā€™s and Pop_OS!

Issue/Bug Description:
In tuesday september 17 2024 I received a notification to reboot my laptop to apply important updates, nvidia-560 among them. However, my WoW in Flatpakā€™s Lutris (the default Pop_OS! version one stopped working altogether, gave up on it) will crash after 10mins of gameplay now, no matter the wine runner used. Was told to rollback, that there is an installation error with the current nvidia driver, however, doing apt install nvidia-driver 555, 550 or 545 all forcefully download and install 560, itā€™s impossible to rollback. Was told to avoid the .run file given by Nvidiaā€™s website too, as it might cause problems that can lead me to reinstall the entire distro. Iā€™m not sure what to do, Iā€™m not very savvy with graphics. The only driver versions apt-get allows to install are <=535, which return a ā€œ3D Accelerator card not supportedā€ error when running a game.

Steps to reproduce (if you know):
WoW was working alright two days ago, but after upgrading to nvidia 560, it crashes after 10mins of normal smooth gameplay and then casting something with complex visuals like a spell or any attack, always at such specific moment.

Other Notes:
My laptop is a HP ENVY m7 Notebook with
Pop!_OS 22.04 LTS x86_64
kernel: 6.9.3-76060903-generic
bash 5.1.16
GNOME 42.9
CPU: Intel i7-7500U (4) @ 3.500GHz
GPU: Intel HD Graphics 620
GPU: NVIDIA GeForce 940MX (v 560.35.03)
Memory: 16GB
Graphics Mode: Hybrid (tested on Nvidia too)

Was told to test the server version and update with results, will do soon. Curiously enough, in World of Warcraft, doing a Delve (for those who donā€™t play, itā€™s new content akin as to what old-school dungeons were), I can only play for like 15 minutes before it crashes. However, I tested 560 further, and it seems it is stable when youā€™re anywhere else but in combat. I even managed to kill 1 raid boss after 45mins of managing my Auction House normally in Dorongal, but then it crashed right after attacking the first small enemies following the bossā€™ death. It seems like the crash happens after probably loading a chunk of spells, caching them or something, Iā€™m not sure. Maybe this can help narrow the cause, I hope

Nvidia DKMS doesnā€™t want to lunch up with Linux kernel 6.11.0 with wayland session.

Itā€™s not 100% compatible yet, there were some changes in the kernel code regarding drm_fbdev_generic_setup

With Vulkan broken in the 560 series drivers Iā€™m really hoping that any fix to the issue noted in 560 release feedback & discussion - #371 by abchauhan

  • NVBug #4840658 vkcube-wayland, wayland apps fail to launch on iGPU for Optimus notebooks running Plasma Wayland sessions

ā€¦ will apply generally to all Vulkan setups, discrete or multi-GPU.

Itā€™s always struck me as odd that the Venn diagram of Gnome/GTK devs using nVidia, or nVidia devs using Gnome/GTK seems minimal to non-existent. Dā€™oh!

Question I noticed the Windows driver got H265/AV1 Twitch support and what I want to know will the next Linux driver be getting the same?

Release 560 seems to be totally broken on Ubuntu 22.04 LTS.

After switching from driver version 555 to 560 even Google Chrome fails to run. In addition, S3 sleep is totally broken and the system wakes up immediately after entering S3 sleep because of broken driver 560.

Here are some additional details:

Nvidia driver 560 errors:

Running two seats ("Switch user" feature), VT7 with XFCE desktop + Picom, VT8 with lightdm and VT9 with MATE desktop

S3 sleep fails:

Related errors from journalctl:

Sep 22 13:01:46 desktop nvidia-persistenced[1080]: ERROR: Failed to find user ID of user 'nvidia-persistenced': No such file or directory
Sep 22 13:01:46 desktop systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Sep 22 13:01:46 desktop systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Sep 22 13:01:46 desktop systemd[1]: Failed to start NVIDIA Persistence Daemon.
Sep 22 13:02:08 desktop kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Sep 22 13:03:18 desktop kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Sep 22 13:03:18 desktop kernel: [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
Sep 22 13:03:55 desktop kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Sep 22 13:03:55 desktop kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Sep 22 13:03:55 desktop kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -5
Sep 22 13:03:55 desktop kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
Sep 22 13:03:55 desktop kernel: PM: Some devices failed to suspend, or early wake event detected
Sep 22 13:04:00 desktop systemd[1]: Failed to start Fix PulseAudio after resume from suspend.
Sep 22 13:04:00 desktop pipewire[2517]: spa.alsa: 'hdmi:1': playback open failed: Device or resource busy
Sep 22 13:04:00 desktop pipewire[3558]: spa.alsa: 'hdmi:1': playback open failed: Device or resource busy

After this, the seat that used picom has only black screen because picom has failed to refresh the screen. The only way to recover the screen without killing apps is to login from VT1 and kill the picom process.

With driver version 555 the kernel command line seemed to be enough to get somewhat reliable S3 sleep:

vidia.NVreg_PreserveVideoMemoryAllocations=1 nvidia_modeset.opportunistic_display_sync=0 nvidia.NVreg_PreserveVideoMemoryAllocations=1

but with driver version 560 this doesn't seem to be enough anymore.


After this, Google Chrome fails to work with following error in the stdout/stderr:
[11314:11314:0922/131527.105758:ERROR:gpu_process_host.cc(980)] GPU process exited unexpectedly: exit_code=139
[11314:11314:0922/131527.368673:ERROR:gpu_process_host.cc(980)] GPU process exited unexpectedly: exit_code=139
Created TensorFlow Lite XNNPACK delegate for CPU.
[11314:11314:0922/131527.642832:ERROR:gpu_process_host.cc(980)] GPU process exited unexpectedly: exit_code=139

However, nvidia-smi seems to think everything is just fine:

mira@desktop:~$ nvidia-smi 
Sun Sep 22 13:15:50 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0  On |                  N/A |
|  0%   41C    P8             14W /  170W |      97MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1370      G   /usr/lib/xorg/Xorg                             86MiB |
+-----------------------------------------------------------------------------------------+

The multi-seat configuration and S3 sleep worked without errors with driver 535 and S3 sleep worked acceptably well with driver version 555. Driver version 560 seems totally broken.


$ aptitude search '~i nvidia' | cat
i A libnvidia-cfg1-560 - NVIDIA binary OpenGL/GLX configuration library
i A libnvidia-common-560 - Shared files used by the NVIDIA libraries
i A libnvidia-compute-560 - NVIDIA libcompute package
i A libnvidia-decode-560 - NVIDIA Video Decoding runtime libraries
i A libnvidia-egl-wayland1 - Wayland EGL External Platform library -- shared library
i A libnvidia-encode-560 - NVENC Video Encoding runtime library
i A libnvidia-extra-560 - Extra libraries for the NVIDIA driver
i A libnvidia-fbc1-560 - NVIDIA OpenGL-based Framebuffer Capture runtime library
i A libnvidia-gl-560 - NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
i A linux-signatures-nvidia-6.8.0-40-lowlatency - Linux kernel signatures for nvidia modules for version 6.8.0-40-lowlatency
i A nvidia-compute-utils-560 - NVIDIA compute utilities
i A nvidia-dkms-560 - NVIDIA DKMS package
i  nvidia-driver-560 - NVIDIA driver metapackage
i A nvidia-firmware-560-560.35.03 - Firmware files used by the kernel module
i A nvidia-kernel-common-560 - Shared files used with the kernel module
i A nvidia-kernel-source-560 - NVIDIA kernel source package
i A nvidia-prime - Tools to enable NVIDIA's Prime
i A nvidia-settings - Tool for configuring the NVIDIA graphics driver
i A nvidia-utils-560 - NVIDIA driver support binaries
i A xserver-xorg-video-nvidia-560 - NVIDIA binary Xorg driver


$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-6.8.0-45-lowlatency root=UUID=dbd2442e-cbca-4518-8487-bd50d0fcd094 ro quiet swapaccount=1 modprobe.blacklist=nouveau nouveau.modeset=0 nvidia-drm.modeset=1 nvidia-drm.fbdev=0 nvidia.NVreg_PreserveVideoMemoryAllocations=1 nvidia_modeset.opportunistic_display_sync=0 nvidia.NVreg_PreserveVideoMemoryAllocations=1

$ uname -a
Linux desktop 6.8.0-45-lowlatency #45.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep  6 15:26:10 UTC x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -S nvidia-persistenced.service
nvidia-compute-utils-560: /lib/systemd/system/nvidia-persistenced.service

At very least it seems that package nvidia-compute-utils-560 is broken because it assumes a user called "nvidia-persistenced" would exist in the system or would be created by dependent package.

Working around this broken package as follows:

$ sudo groupadd -g 143 nvidia-persistenced
$ sudo useradd -c 'NVIDIA Persistence Daemon' -u 143 -g nvidia-persistenced -d '/' -s /sbin/nologin nvidia-persistenced

and rebooting might make S3 sleep in some older release but not here. Above workaroud can be reverted with
$ # sudo systemctl stop nvidia-persistenced
$ sudo userdel nvidia-persistenced
$ # sudo groupdel nvidia-persistenced


Further testing shows

Sep 22 13:58:43 desktop kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to >
Sep 22 13:58:43 desktop kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Sep 22 13:58:43 desktop kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -5
Sep 22 13:58:43 desktop kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5

even though I do have the proc interface available:

$ ls -l /proc/driver/nvidia
total 0
dr-xr-xr-x 5 root root 0 2024-09-22 13:57 capabilities
dr-xr-xr-x 3 root root 0 2024-09-22 13:56 gpus
-r--r--r-- 1 root root 0 2024-09-22 13:56 params
dr-xr-xr-x 3 root root 0 2024-09-22 14:03 patches
-rw-r--r-- 1 root root 0 2024-09-22 14:03 registry
-rw-r--r-- 1 root root 0 2024-09-22 13:58 suspend
-rw-r--r-- 1 root root 0 2024-09-22 14:03 suspend_depth
-r--r--r-- 1 root root 0 2024-09-22 14:03 version
dr-xr-xr-x 3 root root 0 2024-09-22 14:03 warnings

Switching to package nvidia-driver-560-open doesnā€™t have any meaningful effect. Switching to package nvidia-driver-555-open fixes nearly all issues. The only thing thatā€™s broken in nvidia-driver-555-open is that if I use user switching, the desktop runnin with Picom ends up being fully black with only the mouse cursor being visible after switching back to VT with Picom running.

The last driver which didnā€™t have this specific bug (Picom rendering full black screen after S3 sleep + another graphical desktop seat) was nvidia-driver-version-535.

So in summary:

  • S3 sleep + multiple graphical desktops + picom running on one desktop results in black screen regressed in version 545 (Ubuntu repository doesnā€™t contain versions between 535 and 545 so I cannot pinpint his failure more accurately.)
  • S3 sleep is totally broken in version 560
  • Google Chrome fails to run with version 560 but works correctly with 555.

If I were to decide, the problems with version 560 are so bad that it would deserve to be removed from public distribution.

Iā€™ve now reverted to package nvidia-driver-555-open and except for the Picom failing to work in above mentioned edge case, everything seems to work fine.

2 Likes

Issue on Razer Blade 14 2021 (3080, latest bios). dGpu enters D3Cold on AC power but on battery power only does so for a fraction of a second before going back to D0 (about every 20 seconds or so). Iā€™m on Fedora 40 and Iā€™ve had this issue with all recent drivers. Driver version 470xx (from rpmfusion; maintained for older cards) worked as expected in this regard but that version is not ideal since it does not support Wayland. Switching to X11 does not help either on new drivers.

This issue was also present in Windows 11 (dual boot system now but was present before I ever installed linux) but has been fixed with the driver version 560.

See here for the detials NVIDIA GPU Fails to power off (PRIME) Razer Blade 14 2022 - #40 by eitchfive

Currently seeing a issue with 560.35.03 driver and Linux kernel 6.12 on Fedora 40 (KDE spin). using mainline-kernel (6.12 via https://copr.fedorainfracloud.org/coprs/g/kernel-vanilla/mainline/) and latest nvidia driver (560.35.03 via https://rpmfusion.org/Howto/NVIDIA#Latest.2FBeta_driver). nvidia driver fails to build. here is a snip from the output of akmods --force:

Checking kmods exist for 6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64 [  OK  ]
Building and installing nvidia-kmod [FAILED]
Building rpms failed; see /var/cache/akmods/nvidia/560.35.03-1-for-6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64.failed.log for details

here is a tail output of /var/cache/akmods/nvidia/560.35.03-1-for-6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64.failed.log

2024/09/22 07:49:13 akmodsbuild: make[2]: *** [/usr/src/kernels/6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64/Makefile:1928: /tmp/akmodsbuild.mxWIGlRR/BUILD/nvidia-kmod-560.35.03/_kmod_build_6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64] Error 2
2024/09/22 07:49:13 akmodsbuild: make[1]: *** [Makefile:226: __sub-make] Error 2
2024/09/22 07:49:13 akmodsbuild: make: *** [Makefile:89: modules] Error 2
2024/09/22 07:49:13 akmodsbuild: error: Bad exit status from /var/tmp/rpm-tmp.rJoapR (%build)
2024/09/22 07:49:13 akmodsbuild:
2024/09/22 07:49:13 akmodsbuild: RPM build errors:
2024/09/22 07:49:13 akmodsbuild:     Bad exit status from /var/tmp/rpm-tmp.rJoapR (%build)
2024/09/22 07:49:13 akmodsbuild:
2024/09/22 07:49:13 akmods: Building rpms failed; see /var/cache/akmods/nvidia/560.35.03-1-for-6.12.0-0.rc0.20240922gt88264981.308.vanilla.fc40.x86_64.failed.log for details

please advise if anyone else has seen this issue and has any advice other than rolling back to a earlier kernel.

-ryan

6.12 is the newest RC kernel, and RCā€™s often have issues with NV modules when theyā€™re brand new. Usually after a couple of patches things work.

6.11.0 is the newest mainline kernel.

Thanks!

Iā€™ll probably give the 6.12 RC (Fedora rawhide in my case) kernel a couple days then Iā€™ll likely roll back to 6.11.0 (i should be able to roll back to 6.11.0 from Fedora 41). I usually donā€™t fuss with ā€œout of bandā€ kernel versions but iā€™ve had other (non Nvidia) driver issues on this dell XPS-9640 laptop iā€™m running. like missing audio drivers, etc.

anyways, thanks again @jNines

Late to the 560 party but have been trying it out and noticing a massive performance regression in Cyberpunk 2077.

Running 6.11 kernel, Proton Experimental and 560.35.03 open driver.

dmesg is continuously filled with:

NVRM: nvAssertFailedNoLog: Assertion failed: pEventNotificationList->pendingEventNotifyCount == 0 @ event_notification.c:289

and my framerates take a 30% dive.

I tried a few other games and didnā€™t see the same behaviour, but itā€™s hardly a comprehensive investigation.

With the proprietary driver, the dmesg errors arenā€™t present, and the frame rate is better but not the same as 555.

At this point, a bugfix release should be made, honestly. The VRAM leak issue is really annoying to the point that you have to restart to use a game because your DE with a couple of apps is using 4 GB VRAM, plus, there are a bunch of other issues.

1 Like

They acknowledged it at least, and reproduced it too I think.
Thereā€™s similar behavior in X11 too but there the VRAM does get freed after a while, it takes a few seconds thoā€™, it doesnā€™t do it straight away. So I still think thereā€™s some funkyness there too. I donā€™t have another branded GPU to compare it with so I dunno how fast itā€™s supposed to release VRAM. However I donā€™t think they allocate the VRAM the way Nvidia currently does, at lesat not from the testing I heard of.

This is an egl-wayland issue: Xwayland VRAM usage is still excessive when resizing X11 apps under wayland. Ā· Issue #126 Ā· NVIDIA/egl-wayland Ā· GitHub