555 release feedback & discussion

Please provide feedback specific to the 555 release series here.

4 Likes

RTX 4070 - kernel 6.9.0 - 2 monitors
I have to set NVreg_EnableGpuFirmware=0 otherwise KDE 6 is unusable.
I also noticed that logging into KDE (X11 session) seems to take longer now than with the previous drivers.

UPDATE: Something is still very wonky, everything seems delayed, like changing desktop display settings takes longer than usual. And if the KDE screen lock kicks in, it takes some time for it to show the Password dialog.

UPDATE 2:
wayland seems to work better than X11, I also see an improvement there with explicit sync patches.

UPDATE 3:
The x11 issues seems to be a KDE hiccup, it decided to corrupt some config files. It works better now.

3060, kernel 6.9.1
kde plasma 6.0.4
kde framework 6.2
qt 6.7.1
mesa 1:24.0.7-3
75 Ghz monitor
I don’t see any problems with KDE, maybe they are too subtle for me.I did not change the value of NVreg_EnableGpuFirmware(Maybe I’ll do it tomorrow and see what changes.) Finally the problems with games under wayland are gone, I thought I’d never get around to fixing them. Thanks.

RTX 4070Ti - Kernel 6.9.1, Animations with KDE have been improved (smoother), sometimes are choppy / laggy but compared with previous drivers It’s been a great boost, can be improved. It’s a beta driver, so I’m fine for now.

Invalid pointer free · Issue #585 · NVIDIA/open-gpu-kernel-modules · GitHub wasn’t fixed and apparently impacts the closed source driver too.

XWayland apps(Netbeans IDE) feel like they have some latency when typing.

Portal with RTX doesn’t work. Maybe because it uses two different processes?

nvmlClockOffset_v1_t’s design needs reworked: pstate / type fields should be flipped and the dual in/out design forces you to do a defensive copy on setting a new value. It looks like NVML max clock attributes still don’t respond to overclock offsets.

1 Like

4080
6.9.1
Still having the issue described in this post

RTX 3080, MATE desktop, X11. Dual monitor setup. 165hz and 144hz, forcecomp enabled with __GL_SYNC_DISPLAY_DEVICE used to sync to highest refresh monitor.

Huge amounts of frameskips when the firmware is loaded:


disabling the firmware with kernel boot parameter
nvidia.NVreg_EnableGpuFirmware=0

fixes the issue:

Btw, The Last of Us Part I still crashes with XID 109. I’m not sure if it’s an Ada only issue or a regression because when I had my RTX 3060 I did not have any crashes.

not sure if its a kernel regression or driver but as of late (RTD3) Power Management doesnt seem to turn of the nvidia dgpu anymore, lenovo legion 7i with an 4080 .

Runtime D3 status:          Enabled (fine-grained)
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Not Supported
 Status:                    Disabled

nothing runs on the dgpu and it never powers off,
all pcie devices are set to power/control auto etc like i always had. i even went so far as setting the various ENV vars to forcefully run everything on the intel onboard. its either the driver or the kernel 6.9 that changed something.

udev rule to remove hdmi audio/ xhci devices / usb type-c . and in that udev rule set power/control to auto on driver add/bind/unbind/change just for testing purpose to get it to power off but it doesnt anymore.

Today when turning on the computer I got the following error, which is strange, yesterday it was not there.
[drm v_drm_atomic_commit [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00002900] Flip event timeout on head 0
Either way, everything continues to work fine.
UPD: I did some additional checking with NVreg_EnableGpuFirmware=0 and the problem does exist, I just can’t see it without a synthetic test.And it looks like this test doesn’t work in firefox.

screenshot

Getting an error with nvidia-smi with a dual NVIDIA RTX A6000 setup on the second GPU.

nvidia-smi
Wed May 22 06:18:44 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:21:00.0  On |                  Off |
| 30%   32C    P0             77W /  300W |     545MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               Off |   00000000:4B:00.0 N/A |                  N/A |
|ERR!  ERR! ERR!              N/A /  N/A  |       1MiB /  49140MiB |     N/A      Default |
|                                         |                        |                 ERR! |
+-----------------------------------------+------------------------+----------------------+

Reverting back to driver version 550.67 corrects the issue.

Wed May 22 06:27:18 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:21:00.0  On |                  Off |
| 30%   46C    P0             82W /  300W |     569MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX A6000               Off |   00000000:4B:00.0 Off |                  Off |
| 30%   42C    P8             10W /  300W |       8MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

On KDE6 (kwin 6.0.4.1), wayland session with AMD radeon 780m IGPU and NVIDIA GeForce RTX 4060 Mobile DGPU and nvidia-open 555 driver on my external monitor connected to nvidia HDMI port I have low fps equal to half of screen refresh rate ( like as bug 452219 – Low fps and high CPU usage on external monitor connected to NVIDIA when default GPU is Intel)
But with nvidia-open 550 I have normal frame rate on extenal monitor, but lot more CPU usage of kwin_wayland process

That should have been fixed. Would you maybe try adding drm.modeset=1 and fbdev=1 to the module options?

Firstly, I would like to thank the nvidia developers for the push and support for explicit sync, finally we have fully working wayland/xwayland!
Second, about the GSP firmware as the default introduction, this has caused massive lag spikes/framerate skips in the desktop and even made everything feel like it hitches, when it should be smooth.
It can be workaround by adding to the kernel boot parameter the following:
nvidia.NVreg_EnableGpuFirmware=0 which resolves the problem.

3 Likes

These values are already specified and long ago, it happened once and never happened again, but I thought I would mention it

nvmlClockOffset_v1_t ’s design needs reworked: pstate / type fields should be flipped and the dual in/out design forces you to do a defensive copy on setting a new value. It looks like NVML max clock attributes still don’t respond to overclock offsets.

Was this tested before release? It’s pretty broken.

With nvidia proprietary (not open) 555 driver, nvidia.NVreg_EnableGpuFirmware=0 kernel option and Kwin 6.0.4.1 with explicit-sync patch I have nornal frame rate on external monitor and very small CPU usage for kwin_wayland process. But now I afraid another bug with kernel panic Series 550 freezes laptop - #161 by 5bondarenko

555 driver fixed flickering in steam, but now Steam window sometimes corrupts.

Screenshot

Resizing fixing this.

Just for Info (hadn’t found anything about it yet)
Here a Debian user with CUDA developer repo for the driver
However, an old GTX1060 video card in use (maybe the problem)
After updating to 555, applications with nvenc and CUDA (torch/tensor) no longer worked. The GPU is simply no longer recognized for these applications. nvidia-smi and further debugging showed no abnormalities (suggests that everything is fine). Video Decoding, Vulkan/OpenGL or Proton/Steam games also work fine.
After downgrading to 550, CUDA and nvenc worked again.