High IDLE power usage on 3070 Max-Q without any processes running on it

Simple but really annoying problem, i am trying to figure out how to configure properly " PCI-Express Runtime D3 (RTD3) Power Management" since the NVIDIA dGPU uses 11W of power on IDLE without any process running on it making the overall power usage of my laptop <22W/h draining the battery very fast and producing more heat since the dGPU never powers off. The open source NOUVEAU drivers shutdowns the dGPU properly doing the overall power usage a lot less than than the official ones since the dGPU is actually without power when is not used.

So i did some research about this problem and i just found this: download.nvidia.com/XFree86/Linux-x86_64/460.27.04/README/dynamicpowermanagement.html

I tried setting NVreg_DynamicPowerManagement to 0x01 and 0x02 but it didn’t work, the dGPU is still with power without any processes running on it. (Yes, i regenereated the initramfs to apply the settings of /etc/modprobe.d/nvidia.conf)

This is the current output of nvidia-smi

Fri Mar 17 14:29:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8    11W /  55W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This is the current output of /sys/bus/pci/devices/0000\:01\:00.0/power/control:

cat /sys/bus/pci/devices/0000\:01\:00.0/power/control
on

Any clues? I am at mainline Linux kernel 6.2.6, using nvidia open kernel modules. Thanks in advance
nvidia-bug-report.log.gz (1.1 MB)

The open kernel modules are alpha quality regarding geforce gpus, lacking any power management features. You also won’t be able to suspend/resume (unless this was added in recent versions). Please use the standard driver.

Thanks, but this creates other problem that i can’t use the fully closed source ones (nvidia or the dkms ones) because it hangs at boot with traps: Missing ENDBR: _nv012309rm+0x0/0x10 [nvidia] That’s why i am using nvidia open ones. (I can’t do nvidia-bug-report.sh since it can’t even change tty because of that error)

journalctl.txt (8.3 KB)

Please set kernel parameter
ibt=off
or upgrade to the 530 driver, which supports ibt.

Yeah i was just writhing that, is fixed. Now that i am using the fully closed source ones but the dGPU is still powered on when no process is using it even in a TTY without any Xorg or Wayland desktop environment running. Any clues?

Attached new bug-report
nvidia-bug-report.log.gz (942.6 KB)

Please unset any module options you added, reboot and create a new nvidia-bug-report.log.

1 Like

nvidia-bug-report.log.gz (966.3 KB)
Ready, no options set on this one

Please delete /etc/X11/xorg.conf.d/20-intel.conf then reboot and create a new nvidia-bug-report.log

nvidia-bug-report.log.gz (1010.2 KB)

Now it appears to work by default without any options set, and nvidia-smi now has a little delay when is executed, giving this.

Fri Mar 17 18:01:37 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P3    N/A /  55W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

N/A / 55W Seems good and then if i execute it again instantly it reports Watts usage because it was powered on by the last command.

Don’t use nvidia-smi to check runtime suspend, it wakes up the gpu (therefore the small delay). Instead, running
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
should return “suspended”.
Also, options shouldn’t be set, those are autoconfigured by the driver, only to be changed on issues.

Please embed the nvidia driver into the initrd, it’s loading too late.

Please embed the nvidia driver into the initrd

How i do that? I never messed up with initrd before

And /sys/bus/pci/devices/0000:01:00.0/power/runtime_status successfully reports suspended and when i run nvidia-smi it returns active and after a few seconds it changes to suspended again as intended. Now my laptop W/h usage is a lot better (About ~10W). Suspending (S3 and Suspend-to-Idle, Closing the laptop lid) works good too, that was also a issue before

Please embed the nvidia driver into the initrd

Since i am using ArchLinux i attached all the nvidia modules in the MODULES section that is supposed “to be loaded before any boot hooks are run” so they can be loaded at early boot, i am right?.

MODULES=(nvidia nvidia_uvm nvidia_modeset)

Reference: mkinitcpio - ArchWiki

I noticed recently that this thing of RTD3 that automatically suspends the GPU makes my programs open with a noticeable delay because it wakes up the dGPU every time. Any fixes?

When the dGPU is active, all programs open instantly, but when is suspended state, every program that i open has a delay on it because its waiting to the dGPU to be active (for some reason, even if it will not be used). This did not happen with nouveau either.

nvidia-drm also needs to be in the list of modules to be added to the initrd.
No idea though why any application, even the file manager, is waking up the nvidia gpu. That’s definitely not the normal behaviour.

1 Like

Thanks, at least the main problem is solved, i will add nvidia-drm now too. I will ask about the problem of programs activating the dGPU in another thread or in forums. Thanks for your help!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.