[BUG Report] Idle Power Draw is ASTRONOMICAL with RTX 3090

I’m sure I’m one of the first consumers actually running the RTX 3090 on Linux, so I understand that all the issues may not have been caught. And we definitely have one here.

Idle power usage is over 100W at all times. That’s insanity.

I know this is a power-hungry GPU, that’s all well and good, but not at idle when the GPU core clock is at 240MHz. At least it definitely shouldn’t be. Yet as sure as I’m standing here, I have a constant power draw of 110-115W just on the desktop doing nothing and my GPU clocks at 240 for Graphics, 240 for SM, 1290 for Video, and 9751 for Memory (no idea why the Memory is maxed out but either way that shouldn’t matter).

nvidia-bug-report.log.gz (286.9 KB)

I’m running the correct drivers (455.23.04), the only ones that support this card to my knowledge. I’m on kernel 5.9-rc6, but it doesn’t make a difference if I try any other kernels. I’m on KDE, but GNOME has the same issue so it’s not the desktop environment. It’s obvious it’s something with the card/drivers/VBIOS, something to do with the card itself and not the desktop environment, kernel, or anything like that.

Hi gardotd426,
We have filed a bug 3137202 internally for tracking purpose and unfortunately we were not able to recreate issue locally so far.
Can you please help to provide below information -

  1. output of xrandr -q when the power draw is high and desktop is idle (we need this to see if the refresh rates are slightly off for the two displays).
  2. Verify if power draw is high even with a single monitor and share results.

output of xrandr -q

Welp. That looks promising. So yeah I already knew this anyway, but one monitor is 164.80 while the other is 165 even. Most settings daemons and Nvidia-settings show 165 for both but yeah xrandr has always shown 164.80 for the one monitor (even when I had an AMD GPU).

  1. Verify if power draw is high even with a single monitor and share results.

Nope, that’s the bug alright.

Power draw and clocks from nvidia-smi --query using two monitors:

    Power Readings
        Power Management                  : Supported
        Power Draw                        : 99.83 W
        Power Limit                       : 361.00 W
        Default Power Limit               : 350.00 W
        Enforced Power Limit              : 361.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 366.00 W
        Graphics                          : 255 MHz
        SM                                : 255 MHz
        Memory                            : 9485 MHz
        Video                             : 1245 MHz

Power draw and clocks with only one monitor:

    Power Readings
        Power Management                  : Supported
        Power Draw                        : 22.85 W
        Power Limit                       : 361.00 W
        Default Power Limit               : 350.00 W
        Enforced Power Limit              : 361.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 366.00 W
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz

These are with the exact same programs open, taken one after another just with one monitor turned off, then with that monitor on.

For some reason with two monitors the memory clock stays maxed out.

I’m happy to provide any other information you might need to help get this fixed, or at least mitigated. There’s no reason that a second monitor running at the same resolution should cause memory speed to max out and power draw to almost quintuple.

Hi gardotd426,
Engineering team is investigating for root cause, will keep you updated on it.

I’m having the same problem on windows 10 with a 3 monitors + TV setup, the Memory Clocks are at full throttle 100% of the time and the power draw is in the 110-120W range.
I’ve found that by unplugging the HDMI from the TV (LG OLED55CX) (which was not even on btw) I can get the memory to back down to 400 Mhz, and the power to about 35-40W (which is still unreasonably high if you ask me).

Same issue on clean install of Ubuntu 20.04 with Supermicro board and AMD EPYC CPU. Power draw at idle ~120W. Did anyone solve this issue?

If this is a headless server, please enable the nvidia-persistenced to start on boot, make sure it is continuously running and check if that resolves the issue.

Simply reading the post would have eliminated the idea that this was a headless server. I mention several desktop environments. Also who runs a headless server with an RTX 3090.

Anyway, the issue is known. There is a bug in the Nvidia driver that forces the Memory frequency to be stuck at it’s maximum frequency at all times when you use more than one monitor at above 60Hz. It happens in both Wayland and X11.

In Windows, using the exact same machine, idle power usage goes down to 20-ish Watts. Meanwhile the memory clock being pegged to it’s maximum frequency at all times is also causing idle temperatures to be insane (40-42C in a Phanteks P500-A with 5 fans, and I have to set a custom fan curve for even that, using the VBIOS fan curve the GPU idles at 60C. In Windows using the same monitors and same machine with both monitors at 165Hz, the GPU idle temp is barely above ambient. Around 23-25C).

The problem is that having two monitors with refresh rates above 120Hz forces the GPU’s memory clock to stay at it’s maximum frequency at all times. If I disconnect the second monitor, or if I set the second monitor to 60Hz (lowering it to 120 from 165 doesn’t fix it), then instantly I see the issue disappear.

Since this occurs in both Wayland and X11, and on multiple desktop environments/window managers, and regardless of whether there is a compositor or not, meanwhile there is no such issue on Windows with both monitors set to 165Hz, memory frequency behaves normally there, this is clearly a bug in the Nvidia driver.

@amritz I’ve been waiting almost a year and a half for an update. I’ve not received one. I would appreciate some sort of acknowledgement. I’m happy to provide any additional information needed, because this is ridiculous. It’s a horrible waste of energy, it causes the memory modules to constantly run at a high temperature, I’m unable to use any sort of silent fan curve without having idle temps in the 50s or 60s (and this is with my case fans at full speed), I mean this is ridiculous.

I hoped that the adding of GBM support for Wayland and making Wayland usable with Nvidia would fix the issue, but it doesn’t, so it’s not an Xorg problem, it’s a driver problem (the issue doesn’t exist on AMD on either Xorg or Wayland). In Windows it’s absent as well.

The problem at least is identified. Having two monitors running at 120Hz or above (mine are both 165Hz) forces the memory clock to be stuck at it’s maximum frequency at all times. Even if the memory controller is at idle. Lowering one monitor to 60 Hz (120 doesn’t help) immediately solves the issue. In Windows, both monitors can run at 165Hz and none of this happens. The temps are in the 20s, and the memory frequency is at expected levels.

The cause isn’t Xorg, it’s not Plasma, it’s not GNOME, it’s not i3, it’s not picom, it’s not KWin or Mutter, it’s the Nvidia Linux driver.

1 Like