If you have GPU clock boost problems, please try __GL_ExperimentalPerfStrategy=1

  1. Gainward gtx 1070 phoenix gs. vbios: the version available to download on their website. http://www.gainward.com.tw/main/vgapro.php?id=984&lang=en
  2. Nvidia bug report attached to this post
  3. Up to date Debian Stretch with XFCE (tested other desktop environments too, same problem. So it’s not DE related)
  4. dmidecode output attached to this post
  5. Desktop
    nvidia-bug-report.log.gz (1.08 MB)
    dmidecode.txt (11.7 KB)

@darkhorse,

Can you please test with driver 418.56 and share results with us.

Just him? I can reproduce this issue as well.

Also, these drivers are two months old. What’s the point of testing them?

Hi, already tested that driver and shared my result in this thread earlier (6th comment).

Hey NVIDIA!

Any progress on this issue? It’s been three months already (or 2,5 years since it’s been reported).

I performed experiment again on below multiple configurations as stated below with and without reg key on driver 418.74 and observed that it took around 38 secs w/o reg key and 13 secs when reg key was applied to ramp down gpu clocks.

Config Setup 1 - MAXIMUS VIII EXTREME + Debian GNU/Linux 9 + GeForce GTX 1070 + Enabled Composite for both + Displays

Config Setup 2 - Motherboard Alienware 0XF4NJ + AMD Ryzen Threadripper 1950X 16-Core Processor + UEFI Fully updated Fedora 29 + XFCE without compositing. + kernel 5.0.11-200.fc29.x86_64 & kernel 5.0.10-ic64 + 418.56 + X Server 1.20.4

Config Setup 3 - Dell Precision T7610 + Genuine Intel® CPU @ 2.30GHz + GTX 1070 + Debian GNU/Linux 9.1 + 4.9.0-3-amd64 + Compositing On for both displays

Below is output for reference -

oemqa@debian9:~ grep GL_ExperimentalPerfStrategy /etc/environment ; echo __GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    11    49     -    11    15     0     0   405   139

Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     4     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     1     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     2     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     1     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    51     -     5     7     0     0   810   784

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    51     -     5     7     0     0   810   784

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    51     -    11    15     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   164

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    12    15     0     0   405   164

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    50     -    11    14     0     0   405   139

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    50     -    10    14     0     0   405   139

=====================================================================

oemqa@debian9:~ grep GL_ExperimentalPerfStrategy /etc/environment ; echo __GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
1

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    46     -     0     6     0     0   405   139

Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    48     -     5     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    48     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  3802   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    28    48     -     0     1     0     0  3802   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    47     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    47     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    47     -     0     6     0     0   405   139

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    47     -     0     6     0     0   405   139

Below steps were exactly taken where it took around 13 secs to ramp down gpu clocks.

  1. Boot up system and make sure there are no applications running.
  2. Open a terminal and verify output of command echo $__GL_ExperimentalPerfStrategy. It should result 1.
  3. Execute below command to verify secs to ramp down gpu clocks.

grep GL_ExperimentalPerfStrategy /etc/environment ; echo __GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in (seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done

Fedora 30 stock kernel, stock everything: ~14 seconds (which is still too long).

Fedora 30 custom kernel (PREEMPT enabled) + Option “UseNvKmsCompositionPipeline” “Off”: ~36 seconds:

$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done 
__GL_ExperimentalPerfStrategy=1
1
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    44     -     0     0     0     0  4006   936
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     2     1     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    11    43     -     0     1     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     1     3     0     0   405   240
$ cat nvidia.conf

Section "Device"
        Identifier      "Videocard0"
        BusID           "PCI:1:0:0"
        Driver          "nvidia"
        VendorName      "NVIDIA"
        BoardName       "NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)"
        Option          "Coolbits" "28"
        Option          "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
        Option          "UseNvKmsCompositionPipeline" "Off"
        Option          "TripleBuffer" "On"
EndSection

GTX 1060 6GB here.
config-5.1.7z (21.6 KB)

Hi Birdie,

Thanks for experiments, we were expecting around 13-15 secs to ramp down after an update in driver as of now which was earlier ~40 secs.
Will appreciate if you can confirm that you have tested with custom kernel after booting up system immediately (without any applications running in background).

It looks like the changelog for drivers 430.14 doesn’t contain all the info and this exact driver version contains the fix. I can confirm that it takes approximately 14 seconds to ramp down the clocks in drivers 430.26. Hooray!

By any chance, is it possible to further speed up clocks ramp down under Linux? Say make transitions take five seconds or less? It looks like ramp up takes less than a second, while ramp down is way too slow.

Hi Birdie,

Thanks again for your valuable experiments.
Currently, we are able to reduce gpu clocks from ~38 secs to ~14 secs which is a good sign and will continue to investigate for further improvements.

Thank you! Looking forward to fast clock transitions like it works in Windows.

Hello

After installing this driver on CentOS 7.6 I can no longer reach P0 either with persistence-mode running or not. Any way to fix?

OK on 430.26 but still at P2

Mon Jul 1 11:34:51 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:02:00.0 Off | N/A |
| 0% 53C P2 127W / 180W | 7000MiB / 8119MiB | 57% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 1080 On | 00000000:03:00.0 Off | N/A |
| 25% 50C P2 70W / 180W | 7002MiB / 8119MiB | 61% Default |
±------------------------------±---------------------±---------------------+
| 2 GeForce GTX 1080 On | 00000000:81:00.0 Off | N/A |
| 25% 56C P2 83W / 180W | 7002MiB / 8119MiB | 51% Default |
±------------------------------±---------------------±---------------------+
| 3 GeForce GTX 1080 On | 00000000:82:00.0 Off | N/A |
| 24% 47C P2 69W / 180W | 7002MiB / 8119MiB | 54% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 44293 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6989MiB |
| 1 44294 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
| 2 44295 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
| 3 44296 C …ph4dozbmictudhml5/bin/relion_refine_mpi 6991MiB |
±----------------------------------------------------------------------------+

Not directly related to slow rampdown of the clocks; if you have issue where your GPU stays on max power state forever, I found a workaround to fix the issue until you either reboot, restart X server or touch your monitor settings.

My Specs:

  • Ubuntu 19.04
  • GTX 1080Ti
  • two 4K monitors

To temporary fix the power issues:

  • Go to Settings (Power + Cog icon on the top right)
  • Settings / Devices / Displays
  • Change resolution of the primary/left monitor to something like 1028x768
  • Apply + Keep Changes
  • Change resolution back to 4K
  • Apply + Keep Changes

After doing this the power states work as they should be until I restart X server or touch the display settings again. It will work even after suspending the computer.

However this does not work if I change the resolution by using xrandr, do not choose to keep the changes after applying the lower resolution or if I lower the resolution from the secondary monitor instead of the primary one. Turning the second monitor off will fix the power issues, but if you turn the monitor back on, the issues are back (until I do the above again). This is also true if I already had power saving working.

It almost looks like that the graphics driver has some hardcoded resolution limit which forces the GPU to use max clocks forever, but because of some glitch it forgets to set that flag when you resize the screen resolution by using the above steps. The flag seems to be maintained over suspend, so it gets stored to the disk during the suspend.

I hope this helps to allow others to find a temporary workaround for this issue – and the nVidia staff to track down the cause of this issue.

It goes down faster but it doesn’t fix the fact that it does spike for no reason… The frequency goes up and the fans start while completely idling on the desktop for no apparent reason. The CPU and RAM aren’t moving an inch when this happens, so it doesn’t seem to be a system issue…
Doesn’t happen on Windows either, so, doesn’t seems to hardware related.

Update : I tried a few things, and disabling the composition pipeline completely, the card stays at it’s lower state when idling, even lost about 9°c. Is this a normal behavior?
Is there any downside from disabling it?

Is there any progress on this issue?

The first thing we recommend doing to squeeze a little performance out of an aging card is to experiment with game settings themselves. While most reviewers and gamers test titles according to presets (Low, Medium, High, etc), this is a practical time-saving necessity for the former and a matter of convenience for the latter.

Gamers generally know that certain features explicitly tied to AMD or NV GPUs (think GameWorks) can incur heavy performance penalties on other architectures, but the same can be true for other features as well. It’s not unusual for a game’s implementation of ambient occlusion, tessellation, or antialiasing to hit one company’s GPU harder than another, and this can even vary depending on GPU family. Yes, simply lowering game settings or resolution can improve frame rate, but toggling specific features can get you nearly the same results for a smaller reduction in performance. In Deus Ex: Mankind Divided, turning on MSAA has a phenomenal performance impact, for example — much more than you’d typically expect. Linux Training In Pune

On an EVGA 3080 Ti FTW3 Ultra Hybrid, powermizer is stuck at the highest level and never drops to the lower levels when more than 2 monitors are connected. __GL_ExperimentalPerfStrategy=1 does not make any difference. With 2x1440p monitors running at 60hz, it does drop to the lowest power level. When I either bump one of the two monitors up to 144hz or add a third monitor at 60hz, it gets stuck at the highest level. When my monitors go into standby, it does clock down to the lowest level which I can see when I SSH into the machine and run nvidia-smi. My previous card, a Zotac 3080 amp holo, was able to clock down to the lowest power level with four 1440p monitors running (3 @ 75hz, 1 @ 144hz). Changing the power limit and/or clock offsets doesn’t make any difference.

nvidia-smi command below shows it stuck at P0 and using 87w even when usage is 0-1%. When I drop to 1 or 2 monitors, powermizer starts working and it idles at 28w and 32c. nvidia-bug-report file is attached.

$ echo $__GL_ExperimentalPerfStrategy 
1

$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     1     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210
    0    87    45     -     0     1     0     0  9501   210

nvidia-bug-report.log.gz (516.8 KB)

I can confirm the monitor-related observation:

My setup has two monitors connected (actually one monitor on DP, and a TV on HDMI, both 4k, clone mode).

While both monitors are connected, the GPU stays at the highest power level statically. It also causes latency spikes when the system is running for a long time, ultimately showing perf messages in the kernel:

[56896.930341] perf: interrupt took too long (3142 > 3131), lowering kernel.perf_event_max_sample_rate to 63600
[57913.607571] perf: interrupt took too long (3940 > 3927), lowering kernel.perf_event_max_sample_rate to 50700
[59785.486534] perf: interrupt took too long (4933 > 4925), lowering kernel.perf_event_max_sample_rate to 40500
[62631.956353] perf: interrupt took too long (6217 > 6166), lowering kernel.perf_event_max_sample_rate to 32100

The result is micro-freezes of the system. Every once in a while, the mouse cursor would stutter, keyboard inputs are delayed or skipped, video playback skips frames (but audio is not affected), scrolling isn’t smooth anymore, games become unpredictable due to jumpy mouse movement or gamepad input. If I leave the system running for long enough, the effects probably recover, just to come back suddenly.

While the system micro-freezes (short latency spikes or freezes, usually just milliseconds but enough to make mouse movement unpredictable on the desktop), I can go to nvidia-settings and disable the HDMI output, the system immediately recovers from the micro-freezes and the GPU enters low power states. Turning HDMI back on, and the GPU maximizes on power levels even when idling at 1-2% usage. The micro-stutters do not return at that point but eventually they will later.

A reboot usually also fixes the micro-stutters for some hours but GPU power levels stay at maximum.

This is extremely annoying especially while using the mouse. It took me a very long time to finally find this thread, and a work-around (disable second monitor). So I’m pretty sure it’s driver-related. This hasn’t been an issue months ago but I cannot pinpoint when it happened.

But it probably happened about the same time when I discovered that the TV would no longer properly be detected: It usually works after reboot but when I turn the TV off and back on, it only shows a black screen with “no signal detected” while the NVIDIA driver thinks it’s working perfectly fine and shows resolution/refresh/model etc. To fix this, I need to lower the resolution and put it back to 2160p, or I need to set it to 30 Hz instead of 60 Hz (which is quite useless for games which suddenly run at 20-30 fps instead of 50-60).

Update:

Using __GL_ExperimentalPerfStrategy=1 makes no difference.

Maybe related: NVIDIA 455.50.14 nvidia-modeset kernel crash on monitor re-plug

nvidia-bug-report.log.gz (1,1 MB)

Using __GL_ExperimentalPerfStrategy=1 makes no difference.

It can’t, the feature was enabled by default a long time ago.