On Ryzen Mobile, Turing GPU dynamic power management seems to be broken (on driver 435.21)

grigoriy.khvatskiy · October 11, 2019, 10:39am

I have an ASUS TUF FX505DV, which comes with a Ryzen 7 3750H CPU and a RTX 2060 GPU.

The PRIME render offloading feature itself seems to work fine, however the dynamic power management does not work.

I’ve done all the steps from the “automated” section here, Chapter 22. PCI-Express Runtime D3 (RTD3) Power Management , although I still have to manually enable automatic power management as described at PRIME render offloading on Nvidia Optimus - Linux - NVIDIA Developer Forums (see post #63). The kernel module parameter is also set to 0x02, and judging by /proc/driver/nvidia/params it appears to be recognized by the driver.

However, the GPU appears to always be on, nvidia-smi reports this:

$ nvidia-smi
Fri Oct 11 13:35:12 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P8     4W /  N/A |     16MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1201      G   /usr/lib/xorg/Xorg                            14MiB |
+-----------------------------------------------------------------------------+

and system power management, this:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_suspended_time 
active
0

Is this beacause the CPU (and the chipset) does not support the needed ACPI power management features? Is this a misconfiguration on my part or is this a driver problem?

This is the only thing that prevents me from using Linux 100% of the time on this machine, so any help will be much appreciated.
nvidia-bug-report.log.gz (664 KB)

generix · October 11, 2019, 11:43am

Just as a note, you can’t really use nvidia-smi for runtime pm detection since it will wake up the gpu.
The setup seems to be correctly done, so when
cat /sys/bus/pci/devices/0000:01:00.0/power/control
returns “auto” and
cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
still returns “active” then this doesn’t seem to work right now in conjunction with an amd platform. Doesn’t necessarily mean that it doesn’t support this, I’d rather guess that the driver doesn’t expect this combo right now.

grigoriy.khvatskiy · October 11, 2019, 12:14pm

I think that’s the case, this is pretty sad, but maybe an easy fix?

In dmesg, it says that

[    0.736369] pci 0000:01:00.0: PME# supported from D0 D3hot

So I guess both the platform and the GPU support this, but the driver just doesn’t try to make use of it.

generix · October 11, 2019, 1:18pm

That message is irrelevant. Maybe you can get some more info from the driver by setting the module parameter
NVreg_ResmanDebugLevel=0

grigoriy.khvatskiy · October 11, 2019, 1:35pm

Just did that, the bug report is attached.

Another observation is that when I changed udev config so that auto power management is enabled as soon as the device is added, the GPU actually did switch to a suspended state for some time.

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_suspended_time 
active
4372

As you can see, the suspended time is actually more than 0, but the GPU does not suspend at any other time

nvidia-bug-report.log.gz (716 KB)

generix · October 11, 2019, 1:52pm

I guess that it is suspending only until the X driver loads. Then it is kept active.
With debug level set to info, there’s a log flood right now. Interesting would be an info about the PR3 method, probably just at driver loading time. Is it possible to disable X and right after boot run
sudo dmesg |grep PR3
to check if there are some messages about it?

grigoriy.khvatskiy · October 11, 2019, 2:27pm

There are no messages about PR3 in dmesg whether with or without X.

I also grepped the bug report and there isn’t anything about PR3 either (there’s something about _DOD), but I guess it’s not relevant.

deibiddo806 · May 27, 2021, 11:25pm

I see this issue is unresolved, so I’m sorry if bumping this annoys you in some way.

I have an Acer Nitro 5 AN515-43 with a Ryzen 3550H CPU and a GTX 1650 GPU, same problem, except trying to use the PRIME render offloading feature on a game while having “NVreg_DynamicPowerManagement=0x02” option in /etc/modprobe.d/nvidia.conf seems to crash the GPU:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status
error

There are no problems with PRIME render offloading whatsoever when I have “NVreg_DynamicPowerManagement=0x01” option in /etc/modprobe.d/nvidia.conf, but it seems that the GPU is almost always active with this option:

$ cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_suspended_time
active
1224

The bug report is attached below. Thank you in advance.
nvidia-bug-report.log.gz (371.0 KB)

Topic		Replies	Views
Xorg still in GPU with PRIME Offload and dynamic power management Linux	14	4244	October 27, 2022
PCI-Express runtime D3 power management broken by commit 4d03e3cc59828(?) Linux kernel	27	5671	October 12, 2021
Nvidia PRIME Render Offload won't turn off my dGPU Linux	4	2976	November 16, 2020
Incorrect power management with PRIME configuration Linux	25	4322	September 6, 2022
High IDLE power usage on 3070 Max-Q without any processes running on it Linux power , wayland , nvidia-smi	16	2505	April 3, 2023
Using runtime D3(RTD3) power management on GeForce GTX 1650 (Notebook) running on Linux Drivers - Linux, Windows, MacOS	1	1441	September 8, 2021
NVIDIA offloading issue on Slowroll with Ideapad 320 and GeForce MX150. Previously configured to auto-offload and power down the GPU when not in use Linux	6	137	December 22, 2024
Xorg crashes and is unresponsive with driver option NVreg_DynamicPowerManagement=0x02 Linux	5	2856	October 18, 2021
Ubuntu 20.04 - NVIDIA GPU consuming power even when using only integrated graphics card (Intel iGPU) Linux	40	9918	December 21, 2022
[435.17] Am I observing the proper results with the new PRIME? Linux	7	3247	November 26, 2019

On Ryzen Mobile, Turing GPU dynamic power management seems to be broken (on driver 435.21)

Related topics