Very(!) slow ramp down from high to low clock speeds leading to a significantly increased power cons

Hi Hugomaxwell / generix,

This thread was opened to fix ramp down issue which has been fixed and it takes ~14 secs for the same.
But your issue looks different where it stuck on power states and doesn’t ramp down at all.
Request you to please initiate a different thread for better tracking purpose.

This issue has been what?
Is this a joke or am I not understanding something?

If you have access to a PC with an Intel processor, run i7z and notice how the clocks work.
Do the same with your graphics cards and then it will be fixed.
Right now it’s far from fixed.

It’s not fixed for me… In MOST cases, still takes more than 30 seconds to ramp down clocks.

Hi Aintmarks,

Currently, we are able to reduce gpu clocks from ~38 secs to ~14 secs which is a good sign and will continue to investigate for further improvements.

I also tested on notebook having Alienware 17 R5 with driver 418.74 and reg key enabled on terminal, it took 14 secs to ramp down clocks.

Below is the o/p for reference.

oemqa@dbian:~$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
1

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    47     -     0     9     0     0   405   139

Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    40    48     -    11     2     0     0  4006  1442

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    34    48     -     0     1     0     0  4006   949

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    48     -     0     1     0     0  4006   949

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    34    48     -     0     1     0     0  3802   949

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    34    48     -     0     1     0     0  3802   949

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    34    49     -     0     1     0     0  3802   949

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    28    49     -     0     1     0     0  2999   822

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    28    49     -     0     1     0     0  2999   822

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    13    49     -     0     4     0     0   810   822

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    13    49     -     0     4     0     0   810   822

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    13    48     -     0     4     0     0   810   822

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     0     9     0     0   405   189

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     0     9     0     0   405   189

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     1     9     0     0   405   139

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     0     9     0     0   405   139

^C
oemqa@dbian:~$

Can you please perform below steps precisely -

  1. Boot up system and make sure there are no applications running.
  2. Open a terminal and verify output of command echo $__GL_ExperimentalPerfStrategy. It should result 1.
  3. Execute below command to verify secs to ramp down gpu clocks.

grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done

If you will open/close applications continuously, it will take time to ramp down clocks.
Please make sure to reopen and close app after a pause of 45-50 secs to notice change

@amrits:
I’m very happy for you, it is great you’re not facing the problem anymore.
Unfortunately, it seems a lot of us are still affected.
So it is not only up to you to understand why windows takes 3 seconds to ramp down and linux, in the best scenario, needs almost 5 times more.
You’ve still undesrtand why for a lot of users, me included, it still needs more than 11 times more.

Hi Kokoko3k,

We are investigating to improvise further, but would like to know if you performed precisely same steps as per comment #124. Also share your results with us.

Yes amtris, i was the first one to post that “test”.
Doing it again:

koko@Gozer# grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
__GL_ExperimentalPerfStrategy=1
1
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     2    44     -     0     1     0     0  2700  1071
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    11    47     -    90     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    46     -     0     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     2     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     0     1     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    45     -     1     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     4    46     -     3     2     0     0  2700  1215
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     2    45     -     0     2     0     0  2700  1071
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     3    45     -     2     2     0     0  2700  1071
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     2    44     -     0     2     0     0  2700  1071
^C

Made a short, 2 minute video of how quick the gpu clock goes back to idle when using a browser. Only Chrome with one tab opened, and a terminal window is running, nothing else. Right at the beginning of the video, I scroll the page and the gpu clock/power consumption goes up and stays there for a ridiculously long time (~48 seconds!!!). After it’s back to idle, I scroll a bit again and this time it stays for ~46 seconds, great. “echo $__GL_ExperimentalPerfStrategy” returns 1 and the behavior is the same with any of the recent drivers.

Link to the video: [url]https://streamable.com/ikhzx[/url]

Hi darkhorse / kokoko3k,

I reached out to you earlier regarding below queries but didn’t get any response. I matched your config setup from previous posts but not able to repro issue so far.

1 )If you have ever noticed ramp down in 14-15 secs while opening/closing glxgears application when there are no other applications running on your system (may be if you restart your system and test the results).
If not, please share nvidia bug report for system where you are facing the issue.

  1. please share kernel config file so that we can use the same to attempt for repro.
    So far I have been not able to repro issue, trying all possible options to get repro locally.

Hi amtris,
you asked me via private message on 07/04/2019:

And i answered you on 07/08/2019:

I hope my quote is answering your question.

My kernel config is the archlinux one:

config.gz (56.2 KB)

Hi kokoko3k,

Thanks for your reply, we are investigating for further improvements and will keep you updated on the same.

Hi darkhorse,

Please help us to provide information based on comment #129.

Actually on windows it takes usually 3 seconds to cool down and about then 2 when there’s only light load like invoking start menu.
It is also instant when I stop moving around blender viewport, less then a second. This is scenario where power saving really matters and it works perfectly on Windows.

I’ll need to check how it’s now on ubuntu linux, will let you know.

Hi amrits, attached my kernel config file to this comment.

I’m dual booting windows 10 and debian and windows blows linux out of the water when it comes to DE performance. It’s faster and butter smooth while using considerably less power (measured at wall). This is kinda frustrating because linux is much better in everything else for me.
config-4.9.0-9-amd64.txt (182 KB)

Hi darkhorse,

Thanks for providing kernel config file, looks like you are using AMD processor.
Can you please share nvidia bug report, I will try to set up configuration as close as possible to match your setup and will attempt to repro issue locally.

It’s an i7 6700k, not amd. :) Attaching my nvidia bug report.
nvidia-bug-report.log.gz (1010 KB)

Hi All,

This is to keep you informed that our team is investigating to improvise further on ramp down gpu clocks as it has been observed from previous comments that it takes around 3-4 secs on windows platform .
Hence will really appreciate if you can provide us complete repro steps tried on Windows platform along with event logs where it takes around 3 secs so that we can repro it locally and can be compared with linux drivers.
This will help our engineers to debug issue in right direction.

Hint: ask official nvidia windows developers instead of asking help from casual users…

@amrits: that’s great! Please provide how to do such log, I will create one.

Just a sidenote:
funny enough, using the new render offload feature, the nvidia gpu is throttling down nearly instantly.