Second RTX 3090 remains in high idle power

Hello!

I’m reaching out regarding a power management issue in a dual-GPU setup using two identical NVIDIA GeForce RTX 3090 cards on a Linux-based server.

Despite matching firmware and driver configurations, the two GPUs behave very differently at idle, raising the question: Is this expected, or is something misconfigured?

System Overview

  • Distribution: Debian 12 (Bookworm)
  • Kernel: 6.1.x (stock Debian)
  • Driver version: 570.153.02 (official .run installer)
  • GPU 0 Bus ID: 0000:06:10.0
  • GPU 1 Bus ID: 0000:06:11.0
  • No X server, no display attached
  • No CUDA or compute workloads
  • Both GPUs in P8 state (210 MHz core, 405 MHz memory)
  • Persistence mode enabled for both

Observed Behavior

GPU 0 — Expected Idle:

  • Fan: 0%
  • Power: ~7W
  • Temp: ~30°C

GPU 1 — Unexpected Idle:

  • Fan: ~30%
  • Power: 76–78W
  • Temp: ~27°C

Despite symmetric setup:

  • No processes using /dev/nvidia1 (lsof confirms)
  • nvidia-smi shows zero processes
  • BAR1 and FB usage minimal (~2 MiB)
  • PCIe throughput (Rx/Tx) <1 MB/s on both

Actions Taken

  • Verified PCIe link state: both at GEN 1 @ x8
  • Disabled/enabled nvidia-persistenced: no effect
  • Checked for workloads or open handles: none found

Summary

Why does GPU 1 stay at ~77W and spin its fan while idle, with no processes, no workloads, and identical configuration (BIOS, drivers, clocks)?

Any insights into this behavior or known quirks with dual-GPU idle states on Linux would be appreciated.

Thanks in advance

Is this a new issue?

Have you tried swapping the cards around to see if the issue stays with the card?

It has been like this from the start.

I just swapped the cards, and the problem follows the card — not the slot or system configuration.

This strongly suggests it’s something specific to that GPU, despite matching VBIOS, driver, and power state.

Any thoughts on what could cause this kind of behavior? Faulty sensor? Firmware glitch? Something persistent in hardware state?

nvidia-bug-report.log.gz (468.6 KB)

Did you purchase the faulty card new? If not, it’s possible the card has had a hard life, years of crypto mining etc. I’ve seen cards like this exhibit increased power usage, although not to this degree.

I bought the card second-hand, so it’s certainly possible it has seen heavy use (e.g. mining).

The issue consistently follows the card regardless of PCIe slot, and it persists across reboots and driver reloads. Power state and clocks all report as nominal (P8, 210/405 MHz), but the idle power draw remains unusually high.

I had hoped it might be something software-related or fixable via configuration, but it’s starting to look more like a hardware-level issue after all.

The seller however claims it has never been used for mining. Could anything else cause this?

Mining is just normal compute load, it cannot cause any increase idle power. Have you checked if the PowerMizer mode set to Adaptive? If it is (looks like it since you mentioned the Power mode in idle is the same), then it could be the second card is just of a different hardware version. Usually however idle power differences are note that high. If both cards are of the exactly same model, another possibility somebody tried messing around with card’s firmware and upgraded it to a wrong one. If this is something you cannot accept, returning the card back to the seller is the best course of action.

You are correct that it’s normal load. Any task, run at full power for years will cause the same gradual reduction in performance.

Increased leakage current is a long acknowledged result of semiconducter degradation, accelerated by high temperature and exacerbated by overclocking/volting. See here for one paper on the topic.

The duty cycle of life time load expectation is part of the reason data centre class cards are spec-ed with lower clock speeds than Geforce cards.