If you have GPU clock boost problems, please try __GL_ExperimentalPerfStrategy=1

generix · April 3, 2019, 12:24pm

I second kokoko3k, if you look veeery closely, starting only glxgears, after 3 seconds it throttles down. Sometimes you can also start Chrome while glxgears is running and after 3 seconds, it again throttles down but often even the slightest disturbance like raising a window instantly raises clocks to max again and nails it there for 35 seconds.
So for real-world workloads, there’s no usable effect.

amrits · April 8, 2019, 3:44pm

Dear Artem,

Can you please share nvidia bug report or dmidecode output so that we will be able to know your system configuration.
So far, I have setup system with information I had and observed it took approx 13 secs to ramp down gpu clock speeds.

kernelOfTruth · April 13, 2019, 4:06pm

for me it takes 5-7 seconds when launching an app, switching apps, etc. (with compiz compositing enabled)

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7334 G /usr/bin/X 164MiB |
| 0 7558 G cairo-dock 18MiB |
| 0 8153 G compiz 4MiB |
| 0 8362 G …quest-channel-token=1181671030585128907 60MiB |
±----------------------------------------------------------------------------+

when launching chromium it takes quite a while to occupy memory, thus keeping gpu busy (around 15 seconds, however it immediately switches down to P2 or P3 and then - as soon as the task is finished - it clocks down after 5-7 seconds)

much better than the 22-29 seconds before

Thanks folks :)

edit:

even minimizing or maximizing chromium window with magic lamp animation stays at P8 - power usage only varies between 13-11 W

edit2:

please add this feature also to 340.x legacy driver

it’s important for those kind of cards to offer best efficiency (anxiously squinting at the GPU of the Dell XPS m1330 with its thermal conductivity issues)

sandipt · April 15, 2019, 6:23am

5-7 seconds ramp clock down time - That’s good. Hi kernelOfTruth, can you please share nvidia bug report of the system you are using?

[i]Our QA again tested internally and below are the observations :

Configuration Setup = Ubuntu 18.04.1 LTS + EFI Mode + GeForce GTX 1070 & GeForce GTX 1060 6GB + Driver 418.56 + GNOME & KDE

Verified with set and env command that variable is declared followed by execution of google chrome and one of opengl application (glxgears) as non-root user on X terminal and noted ~13 secs to ramp down GPU clock speeds.
Verified above setup in both KDE and GNOME desktop environments along with disabling compositing feature.

Later, I also installed Fedora 29 as per end user’s setup along with driver 418.56 & GTX 1060 6GB and noted approx 14 secs to ramp down gpu clock speeds.
[/i]

amrits · April 24, 2019, 7:19am

Recently, I verified on GeForce Notebook which has GeForce 920M and driver 418.56 installed.
I exported variable __GL_ExperimentalPerfStrategy=1 and observed gpu clocks ramp down in approx 13-14 seconds for google-chrome and glxgears application.

Also observed the same behavior on below configuration setup after enabling & disabling Force Composite feature

Alienware Area-51 R4 + Intel(R) Core™ i7-7900X CPU @ 3.30GHz + GeForce GTX 1070 + 418.56

sandipt · April 25, 2019, 6:50am

Hi All,
Our internal testing on multiple configs shows the Clocks ramp down less than 15 seconds as soon as closed google-chrome. Can you please retest and share your feedback? Make sure the __GL_ExperimentalPerfStrategy is set on the terminal/shell from which you are launching and closing Google-Chrome. Test first with Google-Chrome only and make sure no any other apps running on GPU at the same time. Use nvidia-smi -q -d clock -l to check clocks.

If you still see an issue please provide below info:

What is your GPU vendor? What is the VBIOS of your GPU?
Nvidia bug report log file as soon as issue hit.
The OS and desktop environment like - kde, gnome, mate, xfce, bare X etc?
What dmidecode output? System CPU info?
Is it a notebook, desktop or workstation?

birdie · April 28, 2019, 10:25am

echo $__GL_ExperimentalPerfStrategy
1
$ while :; do date +%s; nvidia-smi dmon -c 1 | tail -1; sleep 1; done
...
1556446847
    0     7    40     -     0     3     0     0   405   139
1556446848
    0    28    41     -     1     0     0     0  4006  1544
1556446849
    0    28    41     -     2     1     0     0  4006  1544
1556446850
    0    28    41     -     1     0     0     0  4006  1544
1556446851
    0    28    41     -     0     0     0     0  4006  1544
1556446852
    0    28    41     -     0     0     0     0  4006  1544
1556446853
    0    28    41     -     0     0     0     0  4006  1544
1556446854
    0    28    41     -     0     0     0     0  4006  1544
1556446855
    0    28    41     -     0     0     0     0  4006  1544
1556446856
    0    28    41     -     0     0     0     0  4006  1544
1556446857
    0    28    41     -     0     0     0     0  4006  1544
1556446858
    0    28    42     -     0     0     0     0  4006  1544
1556446859
    0    28    42     -     0     0     0     0  4006  1544
1556446860
    0    28    42     -     0     0     0     0  4006  1544
1556446861
    0    28    42     -     0     0     0     0  4006  1544
1556446862
    0    28    42     -     0     0     0     0  4006  1544
1556446863
    0    28    42     -     0     0     0     0  4006  1544
1556446864
    0    28    42     -     0     0     0     0  4006  1544
1556446865
    0    28    42     -     0     0     0     0  4006  1544
1556446866
    0    28    42     -     0     0     0     0  4006  1544
1556446867
    0    28    42     -     0     0     0     0  4006  1544
1556446868
    0    28    42     -     0     0     0     0  4006  1544
1556446870
    0    28    42     -     0     0     0     0  4006  1544
1556446871
    0    28    42     -     0     0     0     0  4006  1544
1556446872
    0    29    42     -     0     0     0     0  4006  1544
1556446873
    0    28    43     -     0     0     0     0  4006  1544
1556446874
    0    28    42     -     0     0     0     0  4006  1544
1556446875
    0    28    43     -     0     0     0     0  4006  1544
1556446876
    0    28    43     -     0     0     0     0  4006  1544
1556446877
    0    28    43     -     0     0     0     0  4006  1544
1556446878
    0    25    43     -     0     0     0     0  4006   923
1556446879
    0    25    42     -     0     0     0     0  4006   923
1556446880
    0    25    42     -     0     0     0     0  4006   923
1556446881
    0    24    42     -     0     0     0     0  3802   923
1556446882
    0    24    43     -     0     0     0     0  3802   923
1556446883
    0    24    42     -     0     0     0     0  3802   923
1556446884
    0     9    42     -     0     2     0     0   810   784
1556446885
    0     9    42     -     0     2     0     0   810   784
1556446886
    0     7    42     -     0     3     0     0   405   253
...

Full 38 seconds to cool down.

This after I ran glxgears and exited it.

birdie · April 28, 2019, 10:30am

MSI GeForce GTX 1060 Armor 6G OC (v1).

01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3283
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 39
        Region 0: Memory at f6000000 (32-bit, non-prefetchable) 
        Region 1: Memory at e0000000 (64-bit, prefetchable) 
        Region 3: Memory at f0000000 (64-bit, prefetchable) 
        Region 5: I/O ports at e000 
        [virtual] Expansion ROM at 000c0000 [disabled] 
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00498  Data: 0000
        Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [250 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidia_drm, nvidia

GeForce GTX 1060 6GB
IRQ:         39
GPU UUID:    GPU-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Video BIOS:      86.06.0e.00.28
Bus Type:    PCIe
DMA Size:    47 bits
DMA Mask:    0x7fffffffffff
Bus Location:    0000:01:00.0
Device Minor:    0
Blacklisted:     No

Fully updated Fedora 29 + XFCE without compositing.
Desktop.
nvidia-bug-report.log.gz (657 KB)

sandipt · May 2, 2019, 2:42pm

Hi birdie,
Are you running custom kernel 5.0.10-ic64? PREEMPT? Can we get kernel config file? Do you have any GPU which is not overclocked to test?

birdie · May 2, 2019, 2:50pm

This GPU is running in stock mode, i.e. it’s not overclocked.

I can reproduce this issue with stock Fedora 29/30 kernels as well.

Here’s my kernel config anyways.

Yes, I have preempt enabled:

grep -i preempt .config 
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_DEBUG_PREEMPT is not set

config.zip (24 KB)

_darkhorse · May 12, 2019, 9:33am

Gainward gtx 1070 phoenix gs. vbios: the version available to download on their website. http://www.gainward.com.tw/main/vgapro.php?id=984&lang=en
Nvidia bug report attached to this post
Up to date Debian Stretch with XFCE (tested other desktop environments too, same problem. So it’s not DE related)
dmidecode output attached to this post
Desktop
nvidia-bug-report.log.gz (1.08 MB)
dmidecode.txt (11.7 KB)

amrits · May 21, 2019, 6:13am

@darkhorse,

Can you please test with driver 418.56 and share results with us.

birdie · May 21, 2019, 9:10am

Just him? I can reproduce this issue as well.

Also, these drivers are two months old. What’s the point of testing them?

_darkhorse · May 21, 2019, 5:15pm

Hi, already tested that driver and shared my result in this thread earlier (6th comment).

birdie · June 9, 2019, 10:16am

Hey NVIDIA!

Any progress on this issue? It’s been three months already (or 2,5 years since it’s been reported).

amrits · June 10, 2019, 4:05pm

I performed experiment again on below multiple configurations as stated below with and without reg key on driver 418.74 and observed that it took around 38 secs w/o reg key and 13 secs when reg key was applied to ramp down gpu clocks.

Config Setup 1 - MAXIMUS VIII EXTREME + Debian GNU/Linux 9 + GeForce GTX 1070 + Enabled Composite for both + Displays

Config Setup 2 - Motherboard Alienware 0XF4NJ + AMD Ryzen Threadripper 1950X 16-Core Processor + UEFI Fully updated Fedora 29 + XFCE without compositing. + kernel 5.0.11-200.fc29.x86_64 & kernel 5.0.10-ic64 + 418.56 + X Server 1.20.4

Config Setup 3 - Dell Precision T7610 + Genuine Intel(R) CPU @ 2.30GHz + GTX 1070 + Debian GNU/Linux 9.1 + 4.9.0-3-amd64 + Compositing On for both displays

Below is output for reference -

oemqa@debian9:~$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    11    49     -    11    15     0     0   405   139

Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     4     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    51     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    51     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     2     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    36    52     -     1     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     1     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     2     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    30    52     -     1     1     0     0  4006   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    52     -     1     2     0     0  3802   987

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    51     -     5     7     0     0   810   784

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    51     -     5     7     0     0   810   784

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    51     -    11    15     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   303

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    10    14     0     0   405   164

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    51     -    12    15     0     0   405   164

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    10    50     -    11    14     0     0   405   139

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    50     -    10    14     0     0   405   139

=====================================================================

oemqa@debian9:~$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
1

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    46     -     0     6     0     0   405   139

Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    35    48     -     5     1     0     0  4006  1594

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    48     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  4006   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    29    47     -     0     1     0     0  3802   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    28    48     -     0     1     0     0  3802   974

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    48     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    47     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     9    47     -     0     3     0     0   810   797

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    47     -     0     6     0     0   405   202

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    47     -     0     6     0     0   405   139

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     7    47     -     0     6     0     0   405   139

Below steps were exactly taken where it took around 13 secs to ramp down gpu clocks.

Boot up system and make sure there are no applications running.
Open a terminal and verify output of command echo $__GL_ExperimentalPerfStrategy. It should result 1.
Execute below command to verify secs to ramp down gpu clocks.

grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done

birdie · June 10, 2019, 6:06pm

Fedora 30 stock kernel, stock everything: ~14 seconds (which is still too long).

Fedora 30 custom kernel (PREEMPT enabled) + Option “UseNvKmsCompositionPipeline” “Off”: ~36 seconds:

$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done 
__GL_ExperimentalPerfStrategy=1
1
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    44     -     0     0     0     0  4006   936
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     2     1     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    45     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    44     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    29    43     -     0     0     0     0  4006  1544
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  4006   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    25    43     -     0     0     0     0  3802   936
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    11    43     -     0     1     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     0     2     0     0   810   746
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     9    42     -     1     3     0     0   405   240

$ cat nvidia.conf

Section "Device"
        Identifier      "Videocard0"
        BusID           "PCI:1:0:0"
        Driver          "nvidia"
        VendorName      "NVIDIA"
        BoardName       "NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)"
        Option          "Coolbits" "28"
        Option          "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
        Option          "UseNvKmsCompositionPipeline" "Off"
        Option          "TripleBuffer" "On"
EndSection

GTX 1060 6GB here.
config-5.1.7z (21.6 KB)

amrits · June 10, 2019, 6:45pm

Hi Birdie,

Thanks for experiments, we were expecting around 13-15 secs to ramp down after an update in driver as of now which was earlier ~40 secs.
Will appreciate if you can confirm that you have tested with custom kernel after booting up system immediately (without any applications running in background).

birdie · June 11, 2019, 8:09am

It looks like the changelog for drivers 430.14 doesn’t contain all the info and this exact driver version contains the fix. I can confirm that it takes approximately 14 seconds to ramp down the clocks in drivers 430.26. Hooray!

By any chance, is it possible to further speed up clocks ramp down under Linux? Say make transitions take five seconds or less? It looks like ramp up takes less than a second, while ramp down is way too slow.

amrits · June 11, 2019, 10:44am

Hi Birdie,

Thanks again for your valuable experiments.
Currently, we are able to reduce gpu clocks from ~38 secs to ~14 secs which is a good sign and will continue to investigate for further improvements.