Very(!) slow ramp down from high to low clock speeds leading to a significantly increased power cons

I have vsync enabled in desktop, but high power draw appears even after fresh boot (card never leaves P0 with multiple monitors-single screen X configuration). Same story on pretty much every driver I have used over last year since I noticed, on both GTX970 and GTX1080.

I see sustained 35W power draw after starting X, not having anything open (no firefox, no games, no fancy desktop effects, just raw X and awesome WM tiling manager).

After I disable one monitor (xrandr --output HDMI-0 --off), power draw goes down and card drops from P0 to P8.

I believe that’s expected. Depending on the GPU and the monitors attached, it may not be possible to lower the power state without risk of glitching one or more displays, due to the amount of memory bandwidth required to keep the display active.

Guess better safe then sorry for you guys, when I flashed modded BIOS on my card older card (GTX970) and set (core) TDP baseclock to CLK05 (405Mhz, 0.825V) it was running perfectly fine in desktop and reduced power consumption to ~17W (and still boosted up as needed when I was doing gaming/compute). Technically it was still running in P0, but at much lower clocks than factory defaults - I’m not sure whether lower power states do anything other than lowering table voltages and frequencies (perhaps some auxiliary stuff having to do with pushing display output wouldn’t let it go lower than P0?).

Unfortunately I can no longer flash such a BIOS to my GTX1080 so I guess I will have to live with ~50W+ draw on idle unless you guys somehow tweak this stuff.

@aplattner

This is getting ridiculous.

In 384.47 beta drivers it takes up to 36(!!) seconds for the GPU to cool off in the absence of any GPU load.

There’s a tiny relief that the new drivers consume 28W vs 35W that the last stable drivers did but all things considered 35W7s=245W, vs. the new drivers 28W36=1008W.

I asked you to introduce (via a module option) faster clock changes, you’ve made it a lot worse.

Steps to reproduce:

  1. Run an empty X session (without compositing or anything).
  2. Run terminal and watch -n1 nvidia-smi in it.
  3. Run Google Chrome and exit it immediately. Start counting.

Bump.

I’m confused, did they change anything? I haven’t seen any improvement (still pulling ~45W on idle on GTX1080 and ~35W idle on GTX970) on 384.47, no idea about ramping up/down speeds.

I’ve reverted to NVIDIA drivers 375.82 and my average GPU temperature is now 10 degrees lower. Wow.

Another affected user: https://devtalk.nvidia.com/default/topic/1022911/linux/-bug-linux-381-22-powermizer-isn-t-reducing-clocks-fast-enough/

PowerMizer under linux is extremely slow. It takes about 35 (!!!) seconds to reduce clocks even if there is no GPU load.
I can’t post img currenly, so here is gif:

This is from 375 driver.

Under windows it’s working perfectly good:

I was taking a look @ power meter and it takes about 3 seconds to lower power consumption. So under windows it’s perfectly possible.

I’ve also checked system power consumption using blender 2.78. Under windows I get ~30W back (system) immediately after I stop moving viewport around, I even don’t have to wait 3 secs. Under linux: no way, have to wait 36 seconds. IMO this seems so badly done, that under linux power management works only if you do true desktop idle. I simply don’t understand why it can’t be fixed.

Maybe this is why it won’t be fixed: https://m.youtube.com/watch?v=_36yNWw_07g

I know for a fact that my next GPU is going to be an AMD card. AMD embraces open source a tiny bit more than Nvidia.

Please remove your message. Anyways, I’ve reported it to moderators - hopefully they will erase it. Express your thoughts about NVIDIA somewhere else. Phoronix, Reddit, Linux.com, etc.

This is a support forum, not “I hate-NVIDIA-because-it’s-trendy forum”.

Aaron and other NVIDIA guys here are normal people (engineers, support stuff) who don’t make any decisions in regard to NVIDIA products, modus operandi, etc. etc. etc.

The least they want to see is hatred towards their employer.

Even in drivers before 381.xx it took a lot more than 3 seconds to lower the GPU frequencies (and change the P state) under Linux. I advocate for introducing a kernel module variable to configure this behaviour.

@Aaron, this is a nasty bug. Please bring it up!

@birdie: This seems to be more complicated, when driver detects more peak/constant load it seems to have longer cool-down under windows. But under blender doesn’t matter. Moving viewport: 42W (system), not moving viewport: 30W (system) - instantly.
@Road_hazard: As @birdie stated this is support forum. If you want to help try to find other affected people.

Apparently, posting here begging for Nvidia engineers to fix your bug is about as useful as me complaining about it on Phronix, Reddit, etc. Like talking to a brick wall, no? Sorry, didn’t mean to trigger you, “I don’t like what you’re saying, I’m gonna cry to mommy and daddy and get you to shut up!”

The mods on here are big boys and girls, if they don’t like what I type, they can erase it without you crying about it. Cheer up, maybe Chelsea will run in 2020 and you can vote for her and take back the white house.

It’s obvious that the engineers could care less about fixing a power consumption issue in Linux. You’re wasting your breathe here and on every support forum you mentioned above. This intentional bug has survived through how many Linux driver updates? The engineers have had more than one chance to correct it but they could care less. They’re in bed with Microsoft and their marching orders are to do as little as they can for Linux.

Go ahead and finish up your anti-orange Jesus poster and wake up to this reality and see the world for what it is. I’ll reserve you an AMD card so you can get on with your life and stop wasting your time on developer forums/support forums/soros forums, complaining about a problem that NOBODY cares about except the 5 or so people crying about it on this forum.

Won’t even dignify Road_hazards comment with rational argument, that shit doesn’t work on trolls who have nothing better to do than waste their life and more importantly our time on shit like that.

Time for nvidia to implement user blocking on this forum so we can let them cry in the darkness to their hearts desires.

I’m pretty sure that everyone that cares about this bug is already here and has sounded off on it. There an ETA on a fix? Was it even acknowledged as a bug that NEEDS fixed, or expected behavior? The last reply from Nvidia, TWO MONTHS ago, said that without risking graphic corruption, there doesn’t seem to be anything they can do to fix it.

Time to move on everyone, this is a lost cause.

What part of “This is a support forum and only a few NVIDIA engineers frequent it” don’t you understand? NVIDIA top management, CIO, CEO, CFO, etc. - none of them ever come here.

Could you stop bitching here and talk to Jensen Huang directly?

Or better yet, abandon NVIDIA products altogether.

You have just one way of fixing your problems in Linux with binary NVIDIA drivers: talk about your issues, find other affected users and post as much relevant information as possible.

Bitching will not help. It will make Aaron and other NVIDIA developers avoid these threads like a plague - they are here to work with you, not to hear that their employer doesn’t love open source as much as AMD does. This is f*cking irrelevant!

AMD has become so open source friendly because they are the underdog and they need something to gain prominence. They cannot win by performance metrics, but they gain publicity by being open source friendly.

I’ve updated the original post to make the information in it relevant and actual.

Aaron and Sandipt, please file a bug report, investigate it and solve it.

High clock speeds shouldn’t be retained for more than five consequent seconds in my opinion. Ten are the absolute maximum.

Aaron, I want this bug to be fixed.

New 384 drivers take even longer to downclock :/