570.124.04 hangs, unusable

After upgrading from 570.86.16 to 570.124.04 X hangs on startup to the point I cannot even log in via a text console.

I’m running OpenSUSE 15.6 and the 6.13.5 kernel and an RTX4090 and four monitors. X hangs on startup in the kernel consuming 100% of the CPU every time. X doesn’t fully start and I can’t even log in with a text console. 570.124.4 is totally unusable. I had to log in remotely via ssh and install 570.86.16 to get a usable system again.
OpenGL vendor string: NVIDIA Corporation
kwin_x11[17498]: OpenGL renderer string: NVIDIA GeForce RTX 4090/PCIe/SSE2
kwin_x11[17498]: OpenGL version string: 3.1.0 NVIDIA 570.124.04
kwin_x11[17498]: OpenGL shading language version string: 1.40 NVIDIA via Cg compiler
kwin_x11[17498]: Driver: NVIDIA
kwin_x11[17498]: Driver version: 570.124.4
kwin_x11[17498]: GPU class: Unknown
kwin_x11[17498]: OpenGL version: 3.1
kwin_x11[17498]: GLSL version: 1.40
kwin_x11[17498]: X server version: 1.21.1
kwin_x11[17498]: Linux kernel version: 6.13.5
kwin_x11[17498]: Requires strict binding: no
kwin_x11[17498]: GLSL shaders: yes
kwin_x11[17498]: Texture NPOT support: yes
kwin_x11[17498]: Virtual Machine: no
kwin_x11[17498]: BlurConfig::instance called after the first use - ignoring
kwin_x11[17498]: ZoomConfig::instance called after the first use - ignoring
kwin_x11[17498]: WindowViewConfig::instance called after the first use - ignoring
kwin_x11[17498]: SlidingPopupsConfig::instance called after the first use - ignoring
kwin_x11[17498]: SlideConfig::instance called after the first use - ignoring
kwin_x11[17498]: OverviewConfig::instance called after the first use - ignoring
kwin_x11[17498]: KscreenConfig::instance called after the first use - ignoring
kwin_x11[17498]: DesktopGridConfig::instance called after the first use - ignoring
ksmserver[22510]: [GFX1-]: Detect DeviceReset DeviceResetReason::FORCED_RESET DeviceResetDetectPlace::WR_SIMULATE in Parent process
ksmserver[22745]: [GFX1-]: Detect DeviceReset DeviceResetReason::FORCED_RESET DeviceResetDetectPlace::WR_SIMULATE in Parent process


kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1230

This is repeated a few times

kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c77d:0 2:0:1500:1488

This is all I get as X is stuck at 100%. kill -9 can’t kill X and it remains at 100% CPU so I think it’s stuck in the kernel.

I’m reverting to 570.86.16 until this is resolved.

Did you try to switch off GSP?

options nvidia NVreg_EnableGpuFirmware=0
1 Like

I’m having the same issue.
My PC “hangs” at SDDM every second boot.
By hanging I mean the mouse really slows down and lags behind movement. I can type my password and try to login but then the PC just stucks. I also can’t open a TTY.
I’m on Arch Linux / KDE and with a RTX 3090.

Hadn’t any issue until the update today which also updated the Nvidia drivers.

If I manage to login the PC is running an arbitrary amount of time. However sometimes a monitor just goes black and everything is hanging like on the login manager.
I believe it also crashed reliably when using RawTherapee and when the displays go into standby.

Works much better so far, thanks for the tip!

This option actually made it worse for me.
I used the closed source Nvidia driver and my rig refused to display anything after the boot logo (and the input for decryption of the hard drive) in 19/20 cases.
After I finally managed to login into Arch I reverted this and now I’m able to see a working SDDM in 1/10 cases.
I somehow believe that it works more often if I cold boot the rig (observation over a couple of days since the driver update). Is there any possibility that the memory of the GPU gets corrupted somehow and the cold boot fixes it?
I still experience freezes if manage to login though.

Hmm… This is very strange, as for me. I thought that this option could help or change nothing. And I can’t understand, why it could break display at early boot. Did you try previous driver version? May be this is hardware issue?

I downgraded to 570.86.16 and everything works reliably again. PC reliably shows the login screen and so far no freezing or whatsoever. Will run games later but at least I can use the PC now without rebooting 20 times.
So it can’t be a hardware issue. It is the newly released driver.
Sad story, I guess I have to wait until Nvidia releases a new driver version.
To be fair, this is the first major issue in like 3 years or so, I’m having such a bad time with a driver update.

Beta driver works better than release? Wonderful…

Don’t know if this is the beta driver, but this was the driver I’ve been using before according to my pacman.log

This is definitely beta driver:

Thanks for pointing that out. I wasn’t aware of that. I somehow thought that the nvidia package in Arch Linux repositories always points to a ‘stable’ version. I was obviously wrong about that.
However my session still didn’t crash so I will stick to this driver for the time being.

Similar experience with Debian 12. I had to revert to 565.57.01-1 and it’s been a really annoying endeavor because ALL packages specify a “greater than” version number even if they’re NOT compatible with newer versions.
I ended up having to run this:

apt install nvidia-driver=565.* nvidia-driver-libs=565.* nvidia-vdpau-driver=565.* nvidia-settings=565.* xserver-xorg-video-nvidia=565.* libnvidia-cfg1=565.* nvidia-egl-icd=565.* firmware-nvidia-gsp=565.* libegl-nvidia0=565.* nvidia-kernel-dkms=565.* libxnvctrl0=565.* nvidia-kernel-support=565.*
apt install nvidia-driver-cuda=565.* libnvcuvid1=565.* libnvidia-fbc1=565.* libnvoptix1=565.* libnvidia-encode1=565.* libnvidia-opticalflow1=565.* libnvidia-sandboxutils=565.* nvidia-opencl-icd=565.* libcuda1=565.* libcudadebugger1=565.* libnvidia-nvvm4=565.* libnvidia-ptxjitcompiler1=565.* libnvidia-pkcs11-openssl3=565.* nvidia-kernel-dkms=565.* nvidia-driver-libs=565.* xserver-xorg-video-nvidia=565.* nvidia-vdpau-driver=565.*

It looks like 570.133.07 fixed the issue. So far everything has been working with this release.

1 Like