I second kokoko3k, if you look veeery closely, starting only glxgears, after 3 seconds it throttles down. Sometimes you can also start Chrome while glxgears is running and after 3 seconds, it again throttles down but often even the slightest disturbance like raising a window instantly raises clocks to max again and nails it there for 35 seconds.
So for real-world workloads, there’s no usable effect.
Dear Artem,
Can you please share nvidia bug report or dmidecode output so that we will be able to know your system configuration.
So far, I have setup system with information I had and observed it took approx 13 secs to ramp down gpu clock speeds.
for me it takes 5-7 seconds when launching an app, switching apps, etc. (with compiz compositing enabled)
Sat Apr 13 18:04:58 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 0% 31C P8 11W / 151W | 250MiB / 8116MiB | 1% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7334 G /usr/bin/X 164MiB |
| 0 7558 G cairo-dock 18MiB |
| 0 8153 G compiz 4MiB |
| 0 8362 G …quest-channel-token=1181671030585128907 60MiB |
±----------------------------------------------------------------------------+
when launching chromium it takes quite a while to occupy memory, thus keeping gpu busy (around 15 seconds, however it immediately switches down to P2 or P3 and then - as soon as the task is finished - it clocks down after 5-7 seconds)
much better than the 22-29 seconds before
Thanks folks :)
edit:
even minimizing or maximizing chromium window with magic lamp animation stays at P8 - power usage only varies between 13-11 W
edit2:
please add this feature also to 340.x legacy driver
it’s important for those kind of cards to offer best efficiency (anxiously squinting at the GPU of the Dell XPS m1330 with its thermal conductivity issues)
5-7 seconds ramp clock down time - That’s good. Hi kernelOfTruth, can you please share nvidia bug report of the system you are using?
[i]Our QA again tested internally and below are the observations :
Configuration Setup = Ubuntu 18.04.1 LTS + EFI Mode + GeForce GTX 1070 & GeForce GTX 1060 6GB + Driver 418.56 + GNOME & KDE
Verified with set and env command that variable is declared followed by execution of google chrome and one of opengl application (glxgears) as non-root user on X terminal and noted ~13 secs to ramp down GPU clock speeds.
Verified above setup in both KDE and GNOME desktop environments along with disabling compositing feature.
Later, I also installed Fedora 29 as per end user’s setup along with driver 418.56 & GTX 1060 6GB and noted approx 14 secs to ramp down gpu clock speeds.
[/i]
Recently, I verified on GeForce Notebook which has GeForce 920M and driver 418.56 installed.
I exported variable __GL_ExperimentalPerfStrategy=1 and observed gpu clocks ramp down in approx 13-14 seconds for google-chrome and glxgears application.
Also observed the same behavior on below configuration setup after enabling & disabling Force Composite feature
Alienware Area-51 R4 + Intel(R) Core™ i7-7900X CPU @ 3.30GHz + GeForce GTX 1070 + 418.56
Hi All,
Our internal testing on multiple configs shows the Clocks ramp down less than 15 seconds as soon as closed google-chrome. Can you please retest and share your feedback? Make sure the __GL_ExperimentalPerfStrategy is set on the terminal/shell from which you are launching and closing Google-Chrome. Test first with Google-Chrome only and make sure no any other apps running on GPU at the same time. Use nvidia-smi -q -d clock -l
to check clocks.
If you still see an issue please provide below info:
- What is your GPU vendor? What is the VBIOS of your GPU?
- Nvidia bug report log file as soon as issue hit.
- The OS and desktop environment like - kde, gnome, mate, xfce, bare X etc?
- What dmidecode output? System CPU info?
- Is it a notebook, desktop or workstation?
echo $__GL_ExperimentalPerfStrategy
1
$ while :; do date +%s; nvidia-smi dmon -c 1 | tail -1; sleep 1; done
...
1556446847
0 7 40 - 0 3 0 0 405 139
1556446848
0 28 41 - 1 0 0 0 4006 1544
1556446849
0 28 41 - 2 1 0 0 4006 1544
1556446850
0 28 41 - 1 0 0 0 4006 1544
1556446851
0 28 41 - 0 0 0 0 4006 1544
1556446852
0 28 41 - 0 0 0 0 4006 1544
1556446853
0 28 41 - 0 0 0 0 4006 1544
1556446854
0 28 41 - 0 0 0 0 4006 1544
1556446855
0 28 41 - 0 0 0 0 4006 1544
1556446856
0 28 41 - 0 0 0 0 4006 1544
1556446857
0 28 41 - 0 0 0 0 4006 1544
1556446858
0 28 42 - 0 0 0 0 4006 1544
1556446859
0 28 42 - 0 0 0 0 4006 1544
1556446860
0 28 42 - 0 0 0 0 4006 1544
1556446861
0 28 42 - 0 0 0 0 4006 1544
1556446862
0 28 42 - 0 0 0 0 4006 1544
1556446863
0 28 42 - 0 0 0 0 4006 1544
1556446864
0 28 42 - 0 0 0 0 4006 1544
1556446865
0 28 42 - 0 0 0 0 4006 1544
1556446866
0 28 42 - 0 0 0 0 4006 1544
1556446867
0 28 42 - 0 0 0 0 4006 1544
1556446868
0 28 42 - 0 0 0 0 4006 1544
1556446870
0 28 42 - 0 0 0 0 4006 1544
1556446871
0 28 42 - 0 0 0 0 4006 1544
1556446872
0 29 42 - 0 0 0 0 4006 1544
1556446873
0 28 43 - 0 0 0 0 4006 1544
1556446874
0 28 42 - 0 0 0 0 4006 1544
1556446875
0 28 43 - 0 0 0 0 4006 1544
1556446876
0 28 43 - 0 0 0 0 4006 1544
1556446877
0 28 43 - 0 0 0 0 4006 1544
1556446878
0 25 43 - 0 0 0 0 4006 923
1556446879
0 25 42 - 0 0 0 0 4006 923
1556446880
0 25 42 - 0 0 0 0 4006 923
1556446881
0 24 42 - 0 0 0 0 3802 923
1556446882
0 24 43 - 0 0 0 0 3802 923
1556446883
0 24 42 - 0 0 0 0 3802 923
1556446884
0 9 42 - 0 2 0 0 810 784
1556446885
0 9 42 - 0 2 0 0 810 784
1556446886
0 7 42 - 0 3 0 0 405 253
...
Full 38 seconds to cool down.
This after I ran glxgears
and exited it.
MSI GeForce GTX 1060 Armor 6G OC (v1).
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3283
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 39
Region 0: Memory at f6000000 (32-bit, non-prefetchable)
Region 1: Memory at e0000000 (64-bit, prefetchable)
Region 3: Memory at f0000000 (64-bit, prefetchable)
Region 5: I/O ports at e000
[virtual] Expansion ROM at 000c0000 [disabled]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00498 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel driver in use: nvidia
Kernel modules: nvidia_drm, nvidia
GeForce GTX 1060 6GB
IRQ: 39
GPU UUID: GPU-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Video BIOS: 86.06.0e.00.28
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
Blacklisted: No
- Fully updated Fedora 29 + XFCE without compositing.
- Desktop.
nvidia-bug-report.log.gz (657 KB)
Hi birdie,
Are you running custom kernel 5.0.10-ic64? PREEMPT? Can we get kernel config file? Do you have any GPU which is not overclocked to test?
This GPU is running in stock mode, i.e. it’s not overclocked.
I can reproduce this issue with stock Fedora 29/30 kernels as well.
Here’s my kernel config anyways.
Yes, I have preempt enabled:
grep -i preempt .config
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_DEBUG_PREEMPT is not set
config.zip (24 KB)
- Gainward gtx 1070 phoenix gs. vbios: the version available to download on their website. http://www.gainward.com.tw/main/vgapro.php?id=984&lang=en
- Nvidia bug report attached to this post
- Up to date Debian Stretch with XFCE (tested other desktop environments too, same problem. So it’s not DE related)
- dmidecode output attached to this post
- Desktop
nvidia-bug-report.log.gz (1.08 MB)
dmidecode.txt (11.7 KB)
@darkhorse,
Can you please test with driver 418.56 and share results with us.
Just him? I can reproduce this issue as well.
Also, these drivers are two months old. What’s the point of testing them?
Hi, already tested that driver and shared my result in this thread earlier (6th comment).
Hey NVIDIA!
Any progress on this issue? It’s been three months already (or 2,5 years since it’s been reported).
I performed experiment again on below multiple configurations as stated below with and without reg key on driver 418.74 and observed that it took around 38 secs w/o reg key and 13 secs when reg key was applied to ramp down gpu clocks.
Config Setup 1 - MAXIMUS VIII EXTREME + Debian GNU/Linux 9 + GeForce GTX 1070 + Enabled Composite for both + Displays
Config Setup 2 - Motherboard Alienware 0XF4NJ + AMD Ryzen Threadripper 1950X 16-Core Processor + UEFI Fully updated Fedora 29 + XFCE without compositing. + kernel 5.0.11-200.fc29.x86_64 & kernel 5.0.10-ic64 + 418.56 + X Server 1.20.4
Config Setup 3 - Dell Precision T7610 + Genuine Intel(R) CPU @ 2.30GHz + GTX 1070 + Debian GNU/Linux 9.1 + 4.9.0-3-amd64 + Compositing On for both displays
Below is output for reference -
oemqa@debian9:~$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 11 49 - 11 15 0 0 405 139
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 51 - 4 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 51 - 1 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 51 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 52 - 1 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 2 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 36 52 - 1 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 30 52 - 1 1 0 0 4006 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 30 52 - 1 2 0 0 4006 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 30 52 - 1 1 0 0 4006 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 52 - 1 2 0 0 3802 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 52 - 1 2 0 0 3802 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 52 - 1 2 0 0 3802 987
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 10 51 - 5 7 0 0 810 784
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 10 51 - 5 7 0 0 810 784
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 8 51 - 11 15 0 0 405 303
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 51 - 10 14 0 0 405 303
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 51 - 10 14 0 0 405 303
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 51 - 10 14 0 0 405 164
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 51 - 12 15 0 0 405 164
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 10 50 - 11 14 0 0 405 139
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 50 - 10 14 0 0 405 139
=====================================================================
oemqa@debian9:~$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
1
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 7 46 - 0 6 0 0 405 139
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 35 48 - 5 1 0 0 4006 1594
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 47 - 0 1 0 0 4006 974
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 48 - 0 1 0 0 4006 974
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 47 - 0 1 0 0 4006 974
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 29 47 - 0 1 0 0 3802 974
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 28 48 - 0 1 0 0 3802 974
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 48 - 0 3 0 0 810 797
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 47 - 0 3 0 0 810 797
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 9 47 - 0 3 0 0 810 797
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 8 47 - 0 6 0 0 405 202
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 8 47 - 0 6 0 0 405 202
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 8 47 - 0 6 0 0 405 202
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 7 47 - 0 6 0 0 405 139
gpu pwr gtemp mtemp sm mem enc dec mclk pclk
Idx W C C % % % % MHz MHz
0 7 47 - 0 6 0 0 405 139
Below steps were exactly taken where it took around 13 secs to ramp down gpu clocks.
- Boot up system and make sure there are no applications running.
- Open a terminal and verify output of command echo $__GL_ExperimentalPerfStrategy. It should result 1.
- Execute below command to verify secs to ramp down gpu clocks.
grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
Fedora 30 stock kernel, stock everything: ~14 seconds (which is still too long).
Fedora 30 custom kernel (PREEMPT enabled) + Option “UseNvKmsCompositionPipeline” “Off”: ~36 seconds:
$ grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
__GL_ExperimentalPerfStrategy=1
1
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 44 - 0 0 0 0 4006 936
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 45 - 2 1 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 45 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 28 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 28 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 44 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 43 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 43 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 43 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 43 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 29 43 - 0 0 0 0 4006 1544
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 43 - 0 0 0 0 4006 936
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 43 - 0 0 0 0 4006 936
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 43 - 0 0 0 0 4006 936
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 43 - 0 0 0 0 3802 936
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 25 43 - 0 0 0 0 3802 936
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 11 43 - 0 1 0 0 810 746
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 9 42 - 0 2 0 0 810 746
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 9 42 - 0 2 0 0 810 746
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 9 42 - 1 3 0 0 405 240
$ cat nvidia.conf
Section "Device"
Identifier "Videocard0"
BusID "PCI:1:0:0"
Driver "nvidia"
VendorName "NVIDIA"
BoardName "NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)"
Option "Coolbits" "28"
Option "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
Option "UseNvKmsCompositionPipeline" "Off"
Option "TripleBuffer" "On"
EndSection
GTX 1060 6GB here.
config-5.1.7z (21.6 KB)
Hi Birdie,
Thanks for experiments, we were expecting around 13-15 secs to ramp down after an update in driver as of now which was earlier ~40 secs.
Will appreciate if you can confirm that you have tested with custom kernel after booting up system immediately (without any applications running in background).
It looks like the changelog for drivers 430.14 doesn’t contain all the info and this exact driver version contains the fix. I can confirm that it takes approximately 14 seconds to ramp down the clocks in drivers 430.26. Hooray!
By any chance, is it possible to further speed up clocks ramp down under Linux? Say make transitions take five seconds or less? It looks like ramp up takes less than a second, while ramp down is way too slow.
Hi Birdie,
Thanks again for your valuable experiments.
Currently, we are able to reduce gpu clocks from ~38 secs to ~14 secs which is a good sign and will continue to investigate for further improvements.