535.171.04, kernel 6.8.7 + Xorg is most stable combination so far.
I’ve been using nvidia-open so far now (one kernel only) and I haven’t had a crash while powering off or logging out my computer. However, I have had crashes (2 so far) when I press the “sleep” button in kde. My laptop screen goes black, however the light on my powerbutton stays on and afterwards I can’t wake my laptop up anymore.
Its very annoying as it doesn’t happen on the first or second sleep either. Like I mentioned in my previous post, it happens at night time after I have put my laptop to suspend multiple times a day and kept it on for a while. It happens eventually.
I really hope you are able to fix these issues asap.
Just had that crash happen again but I was able to use magic sysrq keys (REISUB) so its not a kernel panic like i used to experience in the non “nvidia-open” drivers.
What happened is I left my computer and it went to sleep normally, then I turned it on and put it back to sleep manually and I got that crash I described above.
I tried recreating it by leaving my laptop open (lowering the suspend time to 1 minute) and then waking it up once it goes to sleep, then putting it back to sleep and I just can’t recreate it. I tried 11 times now lol. So it has to do with some time the laptop has to stay on I guess.
Here are my normal power settings, I hope someone is able to test it out using these suspend timings and see if they expereince it during normal usage as well.
Under “Screen locking” in kde settings:
Lock screen automatically: 20 minutes
Allow unlocking without password for: 10 seconds
Under “Energy Saving” in kde settings:
Sleep after: 80 minutes (on AC power)
Dim automatically: 15 minutes
Turn off screen after: 20 minutes
When locked, turn off screen: after 60 seconds
I keep my laptop plugged in and haven’t used it on battery so it goes to sleep after 80 minutes of inactivity. If anyone else is experiencing these crashes like I am, please let us know if u can accurately replicate it. Thanks!
Another user experiencing same issues as the OP here.
No crashes so far with NVIDIA open kernel (installed using ./nvidia-installer -m=kernel-open
), though I haven’t tried sleep/suspend, nor the latest propietary driver.
System specs:
System:
Host: lenovo-ip5 Kernel: 6.9.2 arch: x86_64 bits: 64
Desktop: Cinnamon v: 5.6.8 Distro: Debian GNU/Linux 12 (bookworm)
Machine:
Type: Laptop System: LENOVO product: 82L5 v: IdeaPad 5 Pro 16ACH6
CPU:
Info: 8-core model: AMD Ryzen 9 5900HX with Radeon Graphics bits: 64
Graphics:
Device-1: NVIDIA GA107M [GeForce RTX 3050 Mobile] driver: nvidia v: 550.78
Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
driver: amdgpu v: kernel
Display: x11 server: X.Org v: 1.21.1.7 with: Xwayland v: 22.1.9 driver: X:
loaded: nvidia unloaded: fbdev,modesetting,nouveau,radeon,vesa dri: radeonsi
gpu: amdgpu resolution: 2560x1600~60Hz
API: EGL v: 1.4,1.5 drivers: kms_swrast,nvidia,radeonsi,swrast
platforms: gbm,x11,surfaceless,device
API: OpenGL v: 4.6 vendor: amd mesa v: 22.3.6 renderer: AMD Radeon
Graphics (renoir LLVM 15.0.6 DRM 3.57 6.9.2)
API: Vulkan v: 1.3.239 drivers: radv,nvidia,llvmpipe surfaces: xcb,xlib
Loaded modules/driver:
nvidia_uvm 4894720 0
nvidia_drm 118784 2
nvidia_modeset 1888256 2 nvidia_drm
nvidia 10076160 29 nvidia_uvm,nvidia_modeset
drm_kms_helper 270336 4 drm_display_helper,amdgpu,nvidia_drm
[ 3.522791] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.78 Release Build (dvs-builder@U16-I1-N08-06-4) Sun Apr 14 06:38:24 UTC 2024
[ 4.753191] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 550.78 Release Build (dvs-builder@U16-I1-N08-06-4) Sun Apr 14 06:26:34 UTC 2024
Unfortunately the same problem with the new 555 driver. Luckily the open kernel module still runs stable and is usable.
I just tried also to suspend and it seems to work as well with the open kernel module (it did not in the past). Since for me now all seems to work well with the open kernel module, I am happy for now.
I have had this problem for a long time and unfortunately nothing has helped me to fix it so far. Yesterday I switched to the nvidia-open drivers version 555 and since then the problem seems to have disappeared. Even though they are still beta, my computer is behaving much better than expected. Even better than with the previous stable non-OpenSource driver. I will keep them and see what happens.
Same here - I have had no issues since switching to the nvidia-open drivers in my ArchLinux system with an NVIDIA GeForce RTX 4060 Max-Q / Mobile card.
The open module definitely fixes the sleep issues, but there are still freezes in some instances. For example, running the Moonlight streaming client, and connecting to a GameStream server (such as Sunshine) freezes my laptop 100% of the times.
Nvidia cannot fix the critical bug with system damage for more than three months
I think I might have possible workaround on arch (for closed driver)
sudo touch /etc/systemd/do-not-udevadm-trigger-on-update
Will report if it turns out not to work. (It crashes from me during post update hooks. So need to wait for update.)
Yesterday nvidia 550.90.07 and 555.52.04 drivers were released with “Fixed a bug that could lead to a kernel panic, due to a failure to release a spinlock under some conditions.” Has anyone tried them?
With nvidia 555.52.04 proprietary driver and nvidia.NVreg_EnableGpuFirmware=0 kernel option I got kernel panic with error
Jun 06 09:30:16 cosx kernel: BUG: kernel NULL pointer dereference, address: 00000000000000ad
Jun 06 09:30:16 cosx kernel: #PF: supervisor read access in kernel mode
Jun 06 09:30:16 cosx kernel: #PF: error_code(0x0000) - not-present page
…
on the first boot after drivers update.
So new drivers do not fixing problem
Does it work with nvidia.NVreg_EnableGpuFirmware=1?
Same issue with nvidia 555.52.04 proprietary driver.
With nvidia.NVreg_EnableGpuFirmware=1 (default value for 555 driver) my system was started normally, but I do not sure that there will be no errors in the future.
With nvidia-open 555 or nvidia closed 555 driver without nvidia.NVreg_EnableGpuFirmware=0 I have another problem Low fps on external monitor connected to nvidia hdmi port · Issue #650 · NVIDIA/open-gpu-kernel-modules · GitHub So now I am using nvidia-open 550 as more stable
550.90.07 seemed to have fixed the issue, but I can’t verify yet as it’s almost random when the kernel panic happens
Gentlemen I am thrilled to report, MY ISSUES APPEAR TO BE RESOLVED !!! 🥳🎉🤞
It has been 3 days of heavy use, zero crashes. Let me back up.
My setup Razer Blade 15, 3080, Arch, KDE, Wayland, closed 550 (current arch version)
Really all it took was is adding this empty file sudo touch /etc/systemd/do-not-udevadm-trigger-on-update
to stop tiggers durring reloading manager phase.
I have also added nvidia.NVreg_EnableGpuFirmware=0
to DEFAULT_COMMAND_LINE
in /etc/default/grub
.
And added following OGL_DEDICATED_HW_STATE_PER_CONTEXT=ENABLE_ROBUST
to /etc/enviroment
.
It has been rock solid for three days. Used to get at very least one crash a day on nouveau. Often more.
I should also note. I have disabled sddm (login manager). Lanching waylaynd session from command line /usr/lib/plasma-dbus-run-session-if-needed /usr/bin/startplasma-wayland
Before running sudo pacman -Suuyy
I logout of GUI. In an effort to try to minimize chances of triggering crash on reloading phase. I am not sure if it makes any difference. Probably does not, but I am now traumatized from bricking the system trice.
So, so far so good. No crashes. Smooth frame rate, prime-run works as expected. CUDA works as expected (when used in tensorflow and dacinci).
Have not seen any mention of this but maybe its know, But I hade freeze problems on my lenovo yoga pro 9 with a 4070. I tested 550 and got the freezes so I upgrade to 555 but still got them. Now I changed my BIOS setting “System performance mode” to “Extreme Performance” And now the random freeze problem seems to be gone.
I can only say that since I have been using nvidia-open I have had no freezes and I see no reason to go back. But I understand that there are still people who cannot use the package and that is why I do not consider the problem solved.
The problem is that open drivers bring new problems, such as the inability to put the computer to sleep.
So it’s not a universal solution.
On the other hand, leaving such a problem for so long is absolutely scandalous. No more NVIDIA gpu-equipped laptops are coming into my company, that’s for sure.