[BUG Report] Idle Power Draw is ASTRONOMICAL with RTX 3090

I’m sure I’m one of the first consumers actually running the RTX 3090 on Linux, so I understand that all the issues may not have been caught. And we definitely have one here.

Idle power usage is over 100W at all times. That’s insanity.

I know this is a power-hungry GPU, that’s all well and good, but not at idle when the GPU core clock is at 240MHz. At least it definitely shouldn’t be. Yet as sure as I’m standing here, I have a constant power draw of 110-115W just on the desktop doing nothing and my GPU clocks at 240 for Graphics, 240 for SM, 1290 for Video, and 9751 for Memory (no idea why the Memory is maxed out but either way that shouldn’t matter).

nvidia-bug-report.log.gz (286.9 KB)

I’m running the correct drivers (455.23.04), the only ones that support this card to my knowledge. I’m on kernel 5.9-rc6, but it doesn’t make a difference if I try any other kernels. I’m on KDE, but GNOME has the same issue so it’s not the desktop environment. It’s obvious it’s something with the card/drivers/VBIOS, something to do with the card itself and not the desktop environment, kernel, or anything like that.

Hi gardotd426,
We have filed a bug 3137202 internally for tracking purpose and unfortunately we were not able to recreate issue locally so far.
Can you please help to provide below information -

  1. output of xrandr -q when the power draw is high and desktop is idle (we need this to see if the refresh rates are slightly off for the two displays).
  2. Verify if power draw is high even with a single monitor and share results.

output of xrandr -q

Welp. That looks promising. So yeah I already knew this anyway, but one monitor is 164.80 while the other is 165 even. Most settings daemons and Nvidia-settings show 165 for both but yeah xrandr has always shown 164.80 for the one monitor (even when I had an AMD GPU).

  1. Verify if power draw is high even with a single monitor and share results.

Nope, that’s the bug alright.

Power draw and clocks from nvidia-smi --query using two monitors:

    Power Readings
        Power Management                  : Supported
        Power Draw                        : 99.83 W
        Power Limit                       : 361.00 W
        Default Power Limit               : 350.00 W
        Enforced Power Limit              : 361.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 366.00 W
    Clocks
        Graphics                          : 255 MHz
        SM                                : 255 MHz
        Memory                            : 9485 MHz
        Video                             : 1245 MHz

Power draw and clocks with only one monitor:

    Power Readings
        Power Management                  : Supported
        Power Draw                        : 22.85 W
        Power Limit                       : 361.00 W
        Default Power Limit               : 350.00 W
        Enforced Power Limit              : 361.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 366.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz

These are with the exact same programs open, taken one after another just with one monitor turned off, then with that monitor on.

For some reason with two monitors the memory clock stays maxed out.

I’m happy to provide any other information you might need to help get this fixed, or at least mitigated. There’s no reason that a second monitor running at the same resolution should cause memory speed to max out and power draw to almost quintuple.

Hi gardotd426,
Engineering team is investigating for root cause, will keep you updated on it.

I’m having the same problem on windows 10 with a 3 monitors + TV setup, the Memory Clocks are at full throttle 100% of the time and the power draw is in the 110-120W range.
I’ve found that by unplugging the HDMI from the TV (LG OLED55CX) (which was not even on btw) I can get the memory to back down to 400 Mhz, and the power to about 35-40W (which is still unreasonably high if you ask me).

Same issue on clean install of Ubuntu 20.04 with Supermicro board and AMD EPYC CPU. Power draw at idle ~120W. Did anyone solve this issue?

If this is a headless server, please enable the nvidia-persistenced to start on boot, make sure it is continuously running and check if that resolves the issue.

Simply reading the post would have eliminated the idea that this was a headless server. I mention several desktop environments. Also who runs a headless server with an RTX 3090.

Anyway, the issue is known. There is a bug in the Nvidia driver that forces the Memory frequency to be stuck at it’s maximum frequency at all times when you use more than one monitor at above 60Hz. It happens in both Wayland and X11.

In Windows, using the exact same machine, idle power usage goes down to 20-ish Watts. Meanwhile the memory clock being pegged to it’s maximum frequency at all times is also causing idle temperatures to be insane (40-42C in a Phanteks P500-A with 5 fans, and I have to set a custom fan curve for even that, using the VBIOS fan curve the GPU idles at 60C. In Windows using the same monitors and same machine with both monitors at 165Hz, the GPU idle temp is barely above ambient. Around 23-25C).

The problem is that having two monitors with refresh rates above 120Hz forces the GPU’s memory clock to stay at it’s maximum frequency at all times. If I disconnect the second monitor, or if I set the second monitor to 60Hz (lowering it to 120 from 165 doesn’t fix it), then instantly I see the issue disappear.

Since this occurs in both Wayland and X11, and on multiple desktop environments/window managers, and regardless of whether there is a compositor or not, meanwhile there is no such issue on Windows with both monitors set to 165Hz, memory frequency behaves normally there, this is clearly a bug in the Nvidia driver.

1 Like

@amritz I’ve been waiting almost a year and a half for an update. I’ve not received one. I would appreciate some sort of acknowledgement. I’m happy to provide any additional information needed, because this is ridiculous. It’s a horrible waste of energy, it causes the memory modules to constantly run at a high temperature, I’m unable to use any sort of silent fan curve without having idle temps in the 50s or 60s (and this is with my case fans at full speed), I mean this is ridiculous.

I hoped that the adding of GBM support for Wayland and making Wayland usable with Nvidia would fix the issue, but it doesn’t, so it’s not an Xorg problem, it’s a driver problem (the issue doesn’t exist on AMD on either Xorg or Wayland). In Windows it’s absent as well.

The problem at least is identified. Having two monitors running at 120Hz or above (mine are both 165Hz) forces the memory clock to be stuck at it’s maximum frequency at all times. Even if the memory controller is at idle. Lowering one monitor to 60 Hz (120 doesn’t help) immediately solves the issue. In Windows, both monitors can run at 165Hz and none of this happens. The temps are in the 20s, and the memory frequency is at expected levels.

The cause isn’t Xorg, it’s not Plasma, it’s not GNOME, it’s not i3, it’s not picom, it’s not KWin or Mutter, it’s the Nvidia Linux driver.

1 Like

Hello everyone!

Is there any update on this at all? Idle GPU Power usage is still astronomical. I am using a 3080 Fe with i3/X11 on Arch linux.

Previously, I had a 1x 144hz 2k monitor + 1x 1080p 60Hz monitor. In this setup, GPU Idle Power usage was at, ~21W.

Recently, I upgraded the 1080p monitor to another 144Hz 2K Monitor and the idle power usage has shot up to 79-97W.

  1. Both monitors are connected using Display Port 1.4.

  2. Powermizer is at Adaptive setting. GPU clock is ~210Mhz, Memory transfer rate is at 19002Mhz. ’

  3. Disabling ALL Gsync related options has no impact on idle power usage.

  4. xrandr -q

DP-0 connected primary 2560x1440+1440+560 (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440    143.97 + 120.00*   99.95    59.95
   1920x1080     74.91    60.00    59.94    50.00
   1680x1050     59.95
   1600x900      60.00
   1280x1024     75.02    60.02
   1280x800      59.81
   1280x720      60.00    59.94    50.00
   1152x864      59.96
   1024x768      75.03    60.00
   800x600       75.00    60.32
   720x480       59.94
   640x480       75.00    59.94    59.93
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 connected 1440x2560+0+0 left (normal left inverted right x axis y axis) 597mm x 336mm
   2560x1440     59.95 + 165.08   143.91   120.00*
   2048x1152     60.00
   1920x1200     59.88
   1920x1080    119.88    60.00    59.94    50.00    23.98
   1680x1050     59.95
   1600x1200     60.00
   1280x1024     75.02    60.02
   1280x800      59.81
   1280x720      59.94    50.00
   1152x864      75.00
   1024x768      75.03    60.00
   800x600       75.00    60.32
   720x576       50.00
   720x480       59.94
   640x480       75.00    59.94    59.93

I have tested by running both monitors at 120Hz refresh rate and 143.97/143.91 but idle power usage does not change.

  1. There is no impact on power usage if I turn off 1 monitor using the hardware button present on the monitor.

  2. Power usage drops to 15W if I turn off 1 monitor with, xrandr --output DP-2 --off and comes back up to 79-97W the moment I turn on the second monitor with xrandr.

Please try to look into this on priority. This issue is ~2 years old now and roughly 4 times power usage than what is expected is really really annoying.

cc @amrits

Yeah, I’m still seeing the issue on my 3090, running 2 identical 1440p 165Hz monitors (same model number, even same revision, literally identical monitors). I was originally told by an Nvidia employee that if you have two monitors that have “165Hz” refresh rates, that doesn’t necessarily mean they both have exactly 165.00Hz RRs. One could have 164.80, one could have 165.00, and you could check Xrandr to find out. So that’s what I did, because at the time I had 2 1440p 165Hz monitors, but they were different models. Sure enough, one was 165.00 and one was 164.80.

So I bought an identical model to the 164.80 one (it’s a much better monitor overall), and gave the other one to my mom. Even still, idle power usage is always between 100-119W.

It’s obviously the Memory Frequency being pegged at it’s highest possible clock frequency. This screenshot illustrates it perfectly:

The GPU clock can freely go wherever it wants between 285 and 2175MHz, but the Memory transfer rate is pegged to 19662. And with our monitor arrangement, the Performance level is ALWAYS 4.

I actually am able to fix the issue if I drop the refresh rate of the my second monitor to 60Hz. I just did it, and my power usage with Chromium open using HW acceleration is “only” 47W. If I change my second monitor to 120Hz or 165Hz (to match the first monitor), it goes right back to the max memory speed and minimum 100W power usage.

Note my memory transfer rate when the second monitor is at 60Hz:

And note my memory transfer rate is 1620MHz (GreenWithEnvy reports it the old way so it’s half what the NV Control Panel reports):

Screenshot_20220713_073149

That’s what’s so frustrating to me: We KNOW that the bug is present with 2+ monitors (likely 1440p or above) running refresh rates above 120Hz. It’s 1000% confirmed, that’s where the bug lies, and it’s 100% reproducible. So you would think it would be a very easy fix.

Especially since this bug is NOT present on Windows, so it is NOT a hardware issue. It’s a driver issue and it can therefore definitely be fixed.

Yes, I have this exact issue too. Memory transfer rate is maxed out, Performance level is always at 4.

In my case, 1 monitor is 143.97, another is 143.91 but they both support 120.00 refresh rate. (See my xrandr output) but setting them both to 120.00Hz doesn’t fix the problem.

We are actively working on it and working on the fix. Please stay tuned for more updates.
Apologies for the inconvenience caused.

1 Like

We have root caused the issue, fix will be incorporated in future driver releases.
Shall update once it is released publicly.

1 Like

Any chance of other GPU series also receiving this fix? My 1070, 1080 and 1080 Ti also all experience this same issue while powering 3x 1440p monitors at 120.01Hz, 119.99Hz and 119.94 Hz.

Not sure if this will work for anyone or not. I have found that turning HDR on and off will make the memory speed drop to about 400MHz and temps are down to 45C at idle and 0 fan speed. I am thinking it is a windows issue as after reboot I have to disable and then enable HDR again. Even if I only have HDR enabled on one monitor I had to enable and disable it after a reboot. Having HDR off and then rebooting seemed to be fine and left the memory speed at around the 400Mhz after a reboot. I am using a MSI 3090 Ventus 3x OC.

Really looking forward to the fix, I’ve been wasting 75w of power and dealing with extra fan noise 8 hours a day for the last year.

I hope those waiting for a fix are running one of the latest 2 gens, because Nvidia only fixed Turing and above with the 515.76 driver that just dropped.

Guess my 1080 Ti will be stuck taking 80W in idle.

I just installed 515.76 which looked like it might fix this, but my 3080 Ti still idles at 90w. Monitor setup is:

1440p@170hz
1440p@75hz
1440p@75hz
1440p@75hz

On windows desktop it idles at lowest power level. My xorg.conf device section is below:

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce RTX 3080 Ti"
    Option         "ConnectedMonitor" "HDMI-0,DP-0,DP-2,DP-4"
    Option         "DPI" "108 x 108"
    Option         "Coolbits" "28"
    Option         "metamodes" "DP-4: 2560x1440_170 +1440+1440, HDMI-0: 2560x1440_75 +4000+0 {rotation=left}, DP-0: 2560x1440_75 +0+0 {rotation=right}, DP-2: 2560x1440_75 +1440+0"
    Option         "UseNvKmsCompositionPipeline" "false"
EndSection

I have UseNvKmsCompositionPipeline set to false because otherwise the GPU Utilization while sitting on the desktop doing nothing is around 10%.

@walmartshopper
Thanks for sharing the test results.
Can you please remove parameter UseNvKmsCompositionPipeline and cinfirm power drawn.
Also requesting others to share test results with latest released driver.