The 418.56 release added an experimental environment variable, __GL_ExperimentalPerfStrategy=1, which adjusts how GPU clock boosts are handled. When enabled, this option allows the driver to more aggressively drop the GPU back to lower clocks after they are boosted by application activity. If you are experiencing issues with GPU clock boosting, please try this option and let us know whether or not it helps.
GK208M using prime and stuff, I didnāt notice any difference, still takes about 35sec to throttle down at 0~3% GPU usage.
Where did you set it? Iād recommend putting it somewhere like /etc/profile and then logging out and back in so that it applies to everything.
echo "export __GL_ExperimentalPerfStrategy=1" >> /etc/profile
Iāve set it in the general system environment, so it applies to everybody and everything equivalent to /etc/profile (which is a bash script in gentoo composing the env).
#export
gives
declare -x __GL_ExperimentalPerfStrategy=ā1ā
as any user.
Repeated on a GT1030, no change either. Driver (of course) 418.56, Xserver 1.20.3, kernels 4.19/4.9, Gnome 3.30/3.28
Tried it, unfortunately no change hereā¦ Still takes more than 30 seconds to go back to idle. I put the env variable into /etc/profile, using 418.56 (gtx 1070) and up to date Debian Stretch.
No change here either; Iām using OpenGL compositor, so it thinks that Iām gaming all the time and keeps GPU running on full speed without ever going down.
Iām using OpenGL compositor
THAT might be the cause
Iām currently using compositing via xfwm4 (not sure if opengl is used) and it close to all the time can clock down really quickly even when switching through apps, heavy gnome-terminal scrolling and using chromium (site browsing)
We verified this variable internally:
On Ubuntu 16.10 with GeForce GTX 1060 6GB followed by exporting below variable in /etc/profile file.
echo āexport __GL_ExperimentalPerfStrategy=1ā >> /etc/profile
root@oemqa-Precision-WorkStation-T7500:~# echo $__GL_ExperimentalPerfStrategy
1
root@oemqa-Precision-WorkStation-T7500:~# env |grep GL
__GL_ExperimentalPerfStrategy=1
root@oemqa-Precision-WorkStation-T7500:~#
I am not able to repro issue and observe ramp down from high to low gpu clock speeds in around 13 to 15 seconds.
I tested it for opening/closing chrome browser and OpenGL applications like glxgears.
Moreover, I observed that few end users are exporting variable as shown below where I am also able to repro issue and ramping down takes 35 to 40 secs.
declare __GL_ExperimentalPerfStrategy=1
I added
export __GL_ExperimentalPerfStrategy=1
to /etc/profile , logged out and in again
but it doesnāt turn up in
env | grep GL
output, probably needs a reboot in between
in any case it jumped to P0 during launch of chromium (compiz used as compositor)
and even during youtube video playback, app switching, browsing, etc. it stays at P8 (GTX 1070)
(so the default behavior already matches your descriptions for me (?), it took around 17 seconds to low frequencies, close to 20s to get to P8 on initial spike)
tested the duration yesterday and it took longer than 27 seconds to clock down
My results: 36 seconds from P0 to P8. Thatās as bad as itās always been.
$ set | grep __GL
__GL_ExperimentalPerfStrategy=1
exported system-wide in Fedora 29 (mind that this script does not get applied to the X server itself):
$ cat /etc/profile.d/nvidia.sh
export __GL_ExperimentalPerfStrategy=1
$ tr '
$ cat /etc/profile.d/nvidia.sh
export __GL_ExperimentalPerfStrategy=1
$ tr ā\0ā ā\nā < /proc/pidof Xorg
/environ
LANG=en_US.UTF-8
DISPLAY=:0
INVOCATION_ID=$RANDOM_HEX_STRING
PWD=/
JOURNAL_STREAM=X:YYYY
SHLVL=0
XAUTHORITY=/var/run/lxdm/lxdm-:0.auth
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
' '\n' < /proc/`pidof Xorg`/environ
LANG=en_US.UTF-8
DISPLAY=:0
INVOCATION_ID=$RANDOM_HEX_STRING
PWD=/
JOURNAL_STREAM=X:YYYY
SHLVL=0
XAUTHORITY=/var/run/lxdm/lxdm-:0.auth
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
The CPU takes less than 0.1sec to switch power modes. I wonder why itās so freaking difficult for NVIDIA to get it right.
Aaron, this must not be a environment variable, this must be a kernel module option. Variables are quite difficult to keep track of from the kernel space.
It must take less than three seconds to switch from P0 to P8 and vice versa. Otherwise youāre wasting our time, power and batteries.
sandipt, the line "declare -x __GL_ExperimentalPerfStrategy=ā1"ā is just the output when running
export
I can assure you that this variable is set using
export __GL_ExperimentalPerfStrategy=1
in my system environment.
Furthermore, ādeclare -xā is equivalent to āexportā.
Further tests Iāve done with the env variable set:
run on plain Xserver: no effect
nvidia-drm.modeset=0/1: no effect
I wrote a small code below to check the environment variable is really used by 418.56.
// inject.c
//
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
typedef char * (*real_getenv_t)(const char*);
real_getenv_t func = 0;
char* getenv (const char* name)
{
if (!func)
func = (real_getenv_t)dlsym(RTLD_NEXT, "getenv");
char *val = func(name);
printf("getenv(%s) -> %s\n",name, val);
return val;
}
And ran the commands:
$ gcc -shared -fPIC -o inject.so inject.c
$ LD_PRELOAD=./inject.so glxgears
From my observation, not only __GL_ExperimentalPerfStrategy but OGL_ExperimentalPerfStrategy seem to be checked by the 418.56 libraries. But, it doesnāt help unfortunately.
Yeah thatās how it works on windows, only takes a few seconds to go back to idleā¦
Iāve played a little with my settings yesterday and noticed that ForceCompositionPipeline=On, and ForceFullCompositionPipeline=On in xorg.conf has a huge impact on how the driver manages gpu clocks. Without these settings I have to try hard to even make the gpu clock go up (by moving windows, scrolling browser content etcā¦), but even if it goes up, it usually takes around 15-20 seconds to go back to idle again. Unfortunately I need these settings to eliminate tearing in xfce.
edit: after a bit more testing I can confirm that the driver behaves fine without the options mentioned above. But now I have horrible screen tearing, greatā¦
Yeah, ForceFullCompositionPipeline is pretty much mandatory if youāre not using a compositing window manager.
Aaron,
Please consider those of us who are running with
Option "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
On the GT1030, Iām running without an xorg.conf. So nothing is forced, plain xorg autoconfig. Yet, no effect.
Nvidia 418.56 and Plasma with kwin here.
No change at all. It still takes almost 30 seconds to clock down.
If I turn Composition Pipeline on, clocks stay at max speed INDEFINITELY.
Hence I have to keep it off and experience terrible desktop tearing.
Hi All,
I observed that exporting variable __GL_ExperimentalPerfStrategy does not work for few users.
So in order to isolate issue, requesting you to share nvidia bug report; desktop environment being used and detailed repro steps.
A few? So far I havenāt seen a single person whoās reported success with this environment variable.
Hereās my config:
$ cat /etc/profile.d/nvidia.sh
export __GL_ExperimentalPerfStrategy=1
My Xorg config:
$ cat /etc/X11/xorg.conf.d/99-nvidia.conf.fast
#
Section "Device"
Identifier "Videocard0"
BusID "PCI:1:0:0"
Driver "nvidia"
VendorName "NVIDIA"
BoardName "NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)"
Option "Coolbits" "28"
Option "metamodes" "nvidia-auto-select +0+0 {ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
Option "UseNvKmsCompositionPipeline" "Off"
Option "TripleBuffer" "On"
EndSection
Distro: Fedora 29 with all updates installed
DE: XFCE without compositing
GPU: GTX 1060 6GB (no overclock, underclock or anything like that - runs 100% stock settings)
Drivers: OpenGL version string: 4.6.0 NVIDIA 418.56
Steps to reproduce:
Run watch -n1 nvidia-smi
in a terminal.
Run any OpenGL demo or any OpenGL demo in a browser and then stop/exit them.
Count.
The net result: 36 seconds from P0 to P8 which is unacceptable. 3 seconds must be enough. In a perfect world 1 second should be enough.
Again, Iād like this option available as a kernel module option since itās a much more reliable way or at the very list controlled by NVAPI.
This is my test, variable is set in /etc/environment and sourced correctly; i run glxgears for 3 seconds and next i check the clocks.
The first time it goes down in 3 seconds, but in the next run, it takes the canonical 30+ seconds, see:
############# 3 seconds, yay! :D
koko@Gozer# grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
__GL_ExperimentalPerfStrategy=1
1
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 33 - 5 10 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 12 38 - 88 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 36 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 36 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 34 - 2 10 0 0 405 340
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 33 - 0 9 0 0 405 340
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 33 - 2 10 0 0 405 340
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 0 33 - 0 9 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 33 - 4 10 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 0 33 - 0 9 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 0 33 - 4 10 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 0 33 - 0 9 0 0 405 135
^C
################# 35 seconds, DAMN :(
koko@Gozer# grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
__GL_ExperimentalPerfStrategy=1
1
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 0 33 - 3 11 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 9 38 - 91 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 36 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 1 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 1 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 38 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 38 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 3 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 35 - 4 10 0 0 405 275
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 35 - 0 9 0 0 405 275
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 35 - 6 10 0 0 405 135
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 34 - 0 9 0 0 405 135
During the test i used KDE plasma without compositing, nothing else was running.
now i just repeated the test (while writing), and here are the results, 30 seconds:
koko@Gozer# grep GL_ExperimentalPerfStrategy /etc/environment ; echo $__GL_ExperimentalPerfStrategy ; while true ; do nvidia-smi dmon -c 1 ; timeout 3 glxgears ; for i in $(seq 1 50) ; do nvidia-smi dmon -c 1 ; sleep 1 ; done ; done
__GL_ExperimentalPerfStrategy=1
1
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 33 - 0 9 0 0 405 254
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 10 39 - 89 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 1 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 37 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 1 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 38 - 0 2 0 0 2700 1215
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 4 36 - 0 9 0 0 405 275
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 35 - 0 10 0 0 405 275
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 1 35 - 0 9 0 0 405 275
^C
nvidia-bug-report.log.gz (1.28 MB)