Dynamic boost for laptop stopped working

I’ve had nvidia-powerd running for a while without issue, boosting the GPU TDP up to 100W. But since recently, it is no longer doing it’s job.

systemd output:

May 01 13:40:08 arch-yoga systemd[1]: Started nvidia-powerd service.
May 01 13:40:08 arch-yoga /usr/bin/nvidia-powerd[7570]: nvidia-powerd version:1.0(build 1)
May 01 13:40:08 arch-yoga /usr/bin/nvidia-powerd[7570]: error setting power limit
May 01 13:40:08 arch-yoga /usr/bin/nvidia-powerd[7570]: Error setting GPU limit: 85000.
May 01 13:40:08 arch-yoga /usr/bin/nvidia-powerd[7570]: Failed to get topology status 55
May 01 13:40:08 arch-yoga /usr/bin/nvidia-powerd[7570]: Dbus Connection is established
May 01 13:40:10 arch-yoga /usr/bin/nvidia-powerd[7570]: error setting power limit
May 01 13:40:10 arch-yoga /usr/bin/nvidia-powerd[7570]: Error setting GPU limit: 100000.
May 01 13:40:10 arch-yoga /usr/bin/nvidia-powerd[7570]: Failed to get topology status 55
May 01 13:40:10 arch-yoga /usr/bin/nvidia-powerd[7570]: error setting power limit
May 01 13:40:10 arch-yoga /usr/bin/nvidia-powerd[7570]: Error setting GPU limit: 100000.

And it just keeps creating those two error messages multiple times a second. According to nvidia-smi, the TDP is set to 85W now instead of the desired 100W.

my system
intel i9-13905h + rtx 4060

software
linux 6.8.8.arch1-1
lib32-nvidia-utils 550.76-1
nvidia-open 550.76-3 (also tried the non-open version)
nvidia-prime 1.0-4
nvidia-settings 550.67-1
nvidia-utils 550.76-3

The driver v550.76 has critical bugs some notebooks, please check driver 550.78.

Is it possible this Bug is still present in 555.52.04 ?

My GeForce RTX 4080 Max-Q gets eventually stuck here

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.04              Driver Version: 555.52.04      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   43C    P0            ERR! /  150W |    1633MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         

It might start during boot and even set to my max 175W Limit.

at 23:25:08 ❯ sudo systemctl status nvidia-powerd
[sudo] password for crashdummy: 
● nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/etc/systemd/system/nvidia-powerd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Fri 2024-06-07 23:24:47 CEST; 23s ago
   Main PID: 2478 (nvidia-powerd)
      Tasks: 3 (limit: 76636)
     Memory: 532.0K (peak: 1.0M)
        CPU: 26ms
     CGroup: /system.slice/nvidia-powerd.service
             └─2478 /usr/bin/nvidia-powerd
 
Jun 07 23:24:47 crashtux systemd[1]: Started nvidia-powerd.service - nvidia-powerd service.
Jun 07 23:24:47 crashtux /usr/bin/nvidia-powerd[2478]: nvidia-powerd version:1.0(build 1)
Jun 07 23:24:47 crashtux /usr/bin/nvidia-powerd[2478]: Dbus Connection is established

But after a while it appears like it gets stuck with its last setting

| NVIDIA-SMI 555.52.04              Driver Version: 555.52.04      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   54C    P0             80W /  155W |    1656MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |

After another reboot:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.04              Driver Version: 555.52.04      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   55C    P0             41W /  175W |    1935MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

afterards my logs are flooded with

Jun 07 18:00:26 crashtux /usr/bin/nvidia-powerd[32650]: error setting power limit
Jun 07 18:00:26 crashtux /usr/bin/nvidia-powerd[32650]: Error setting GPU limit: 175000.
Jun 07 18:00:26 crashtux /usr/bin/nvidia-powerd[32650]: error setting power limit
Jun 07 18:00:26 crashtux /usr/bin/nvidia-powerd[32650]: Error setting GPU limit: 175000.
Jun 07 18:00:26 crashtux /usr/bin/nvidia-powerd[32650]: error setting power limit

The dmesg doesnt look like nvidia is doing anything crazy.

$ sudo dmesg | grep -i nvidia
[    8.821580] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input18
[    8.822338] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input19
[    8.822561] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input20
[    8.833622] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input21
[   11.410580] nvidia: module license 'NVIDIA' taints kernel.
[   11.410586] nvidia: module license taints kernel.
[   11.592208] nvidia-nvlink: Nvlink Core is being initialized, major device number 508
[   11.593568] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[   11.593745] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[   11.641046] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  555.52.04  Tue Jun  4 13:54:58 UTC 2024
[   11.705867] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[   11.814803] nvidia-uvm: Loaded the UVM driver, major device number 506.
[   11.849925] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  555.52.04  Tue Jun  4 13:21:08 UTC 2024
[   11.853667] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   13.472530] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
[   13.481596] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
[   13.511857] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[   13.532462] nvidia 0000:01:00.0: [drm] fb1: nvidia-drmdrmfb frame buffer device