Disable Power management in GTX 1070

sakjain92 · July 20, 2018, 1:04pm

I am trying to do some performance analysis on GPU (GTX 1070, on Ubuntu Linux) and I can see that initially on cold start, the runtimes are high, but after running tasks for some time, their runtime decreases and stabilize. I suspect this might be due to power management (clocking up/down) in GPU by the driver.

Is there a way to disable this? If not, then
What can be done to minimize its impact and get better performance analysis?
Are there any other GPUs (other than GTX 1070) that have better power management control given to user?

njuffa · July 20, 2018, 5:26pm

This sounds more like a case of automatic clock boosting while staying in the same power state, than transitions between different power states. A look at nvidia-smi output could probably confirm that (it shows the power state).

The Tesla line of professional GPUs offers (or at least offered, I haven’t used recent SKUs) a collection of application clocks that users can select from with nvidia-smi -ac. The idea being this is that in a cluster of GPU-accelerated machines (typical deployment environment of Tesla GPUs) it causes problems with work distribution etc, when different GPUs run at different auto-boost clocks due to variations in temperature and power usage.

A reasonably practical alternative may be to exploit the effect you have observed: “their runtime decreases and stabilizes”. This still leaves the problem of clock variations over longer period of times, e.g. caused by different GPU temperatures due to differences in ambient temperature on different days, or at different times of the day.

Since CPUs also have automatic clock boosting (e.g. mine boosts anywhere from 3.5 GHz to 3.9 GHz based on a number of parameters), without any way of directly controlling it as a user from what I can tell, this issue has a wider scope. I expect automated clock boosting to get ever more intricate, with wider ranges of possible clocks, as manufacturers try to squeeze maximum performance out of silicon after the death of Moore’s Law.

cbuchner1 · July 23, 2018, 1:36pm

the death of Moore’s Law… hmm, the former Intel CEO begs to differ ;)

Robert_Crovella · July 23, 2018, 2:48pm

[url]Dennard scaling - Wikipedia

cbuchner1 · July 23, 2018, 2:58pm

That clock speeds have stopped scaling is pretty clear. Actually it has stopped much earlier than 2006.

2002 Pentium IV: 3 GHz
2018 Intel Core i7-8086k: 5 GHz… only in Turbo mode on a few cores. 4GHz is base clock.

But we’ve continued to downscale transistors (keeping power density on the chip area approximately constant) - which is what Moore’s law is all about. What you could argue is that transistor scaling has slowed down recently - and hence this marks the end of Moore’s law.

njuffa · July 23, 2018, 4:20pm

You mean this former Intel CEO?

[url]https://www.fool.com/investing/2018/04/11/is-intel-corp-ceo-brian-krzanich-to-blame-for-its.aspx[/url]
In the years since Krzanich took the CEO role, however, the company’s manufacturing efforts have been, to put it mildly, poor. The ramp up of the company’s 14-nanometer manufacturing technology was both late and highly problematic from a technology and financial point of view, and the company’s follow-up 10-nanometer technology is still, as of this writing, missing in action despite being originally slated to go into mass production more than three years ago.

While some foundries are doing a bit better than Intel at the moment, best I can tell, Moore’s Law is pretty much dead. This is not owed to technical issues alone, but in combination with financial factors (a green-field fab will run you $10B, what could one produce in it to recoup the investment and turn a profit?). From here on out the performance game will consist mostly in refining microarchitectures and optimizing software, ideally in the form of cohesive hardware/software co-development.

Personally, I believe NVIDIA is very well positioned to excel at that game.

spudz76 · July 31, 2018, 12:28pm

GTX970 was the best consumer level card as far as still having nvidia-smi features similar to what a Tesla compute card does (app clocks, real P-state locking and controls, etc) before nVidia caught on and hobbled the next gen to drive sales of compute-approved type cards.

The 10xx series are lobotomized in the driver and will never run P0 during compute sessions. If you use the clock offsets to gain some clock control, and then exit the compute session usually the card will jump from P2->P0->P8 instead of just P2->P8 in which case it can hang or crash bus because P2base+offset is sometimes much much lower than P0base+offset and thus when it visits P0 momentarily it will painfully overclock into crashing.

Run windows and then find nvidiaprofileinspector which allows you to turn off this Force-P2 silliness at least (tweak base global profile, apply, reboot - repeat every time you change driver versions too, it gets changed back). You still get no good features from nvidia-smi but at least the clocking is more controllable.

P2 lock was done allegedly so that results would be more reliable, but that doesn’t allow for people with use-case where speed is everything and corrupt results can be easily validated and tossed (and compute rate minus a few bad apples is still higher than compute rate locked in P2). Also I have some PNY 1060 cards where P2 === P0 clocking anyways, those work nice in Linux since effectively its always P0 even if the driver asks for P2. These other single fan MSI ones have seriously stupid slow P2 settings in bios so I must run them in windows only, or suffer 20% performance hit which is a big problem. I already tried applying the windows driver nvreg key by hex handle into the linux nvreg but it didn’t have an effect (as advertised elsewhere, feature not in the linux driver at all).

Not every compute app is computer vision or whatever, guys. At least implement the same setting in the Linux drivers… kind of tired of running windows just to get full speed out of these cards… probably lose 20% again to rebooting and general windows being windows…

cbuchner1 · July 31, 2018, 12:45pm

I’ve tried a few things to achieve P0 state, like doing OpenGL rendering in the background. Finally accepted defeat and moved on.

So whatcha mining? ;)

spudz76 · July 31, 2018, 12:59pm

I suppose thats the main reason for calc-fast-check-results-later, so ya got me (ethereum cranker)

but I sweeeeaar there are other use-cases for such! there must be!

Why not just tell the science types to underclock if they like accuracy, instead of putting the cards on crutches for everyone? Oh right, marketing dept forced it. The actual death of Moores Law is caused by marketing departments.

When you don’t make artificial price points and just make the fastest widget you can possibly make, things improve at the natural rate (approaching Moores Law) however when profit extraction requires:
slowly…stepping…through…product levels…and feature set…combos…and making fake feature sets by software strapping (remember AMD cpus with a whole core available to hack-unlock which worked fine? or AMD hawaii gpu with 4 disabled shader units that could be flipped back on via flash? neither of these was a QA-binning thing as much as they like you to think so)

I think Intel just makes the 5GHz core and then sets the clocking for whatever market niches the marketing dept says are packed with rubes and loose wallets.

But I love nVidia mainly by proxy as I used to love 3dfx the most but you guys ate them.

Topic		Replies	Views
Very fast ramp-down from high to low clock speeds leading to increased time repeatedly ramping up CUDA Programming and Performance	13	1591	January 15, 2021
GeForce/Quadro power management on a headless Linux machine (without X server) CUDA Programming and Performance	3	3418	May 6, 2015
Avoiding performance mode slowdown CUDA Programming and Performance	12	4341	October 18, 2017
Benchmarking on P100 without BOOST CUDA Programming and Performance	3	1212	January 17, 2019
[GTX 1070][driver 4.15.25] Performance mode (p0) does not automatically downshift to idle (p8) after... Linux	10	5781	January 19, 2019
Powermizer on the GTX970 Linux	16	13670	July 4, 2016
Performance state switches from P0 to P2 when starting program CUDA Programming and Performance cuda , python , linux	17	14796	February 26, 2026
If you have GPU clock boost problems, please try __GL_ExperimentalPerfStrategy=1 Linux	56	29479	May 31, 2023
Ways to lower laptop's GTX 1050 Ti's clocks? Linux	0	552	May 8, 2022
Increase Performance with GPU Boost and K80 Autoboost Technical Blog	19	1492	September 13, 2017

Disable Power management in GTX 1070

Related topics