Idle power usage stuck at 10-20watts after running an app

vans554 · June 13, 2022, 9:06pm

Idle power usage on RTX3060 LHR is stuck at 10-20 watts after running an app, like FFMPEG using the nvenc asic or using cuda. It does not return back to 4-5 watts and the card heats up to around 50c. But sometimes a card does get into the low power idle state (4-5watts), I am not sure why.

To test run something on the GPU, stop it. Notice 15-25w while idle, then do modprobe -r nvidia_drm; modprobe nvidia_drm to reset the GPUs, notice the power is back to 10ish watts with 1 lucky GPU at 4w.

How to make all the GPUs idle at 4-5w after running workload?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:18:00.0 Off |                  N/A |
|  0%   50C    P8    14W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:51:00.0 Off |                  N/A |
|  0%   49C    P8    10W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:8A:00.0 Off |                  N/A |
|  0%   40C    P8     4W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:C3:00.0 Off |                  N/A |
|  0%   49C    P8    10W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------

MarkusHoHo · June 27, 2022, 1:26pm

Hi @vans554 and welcome back to the developer forums!

Can you share a bit more detail on your setup? For example:

What kind of enclosure are you using
what brand(s) are the GPUs
which Linux Distribution is this running on?
when you observe the status as above, what are the fans doing of the respective GPUs?
How long does the above status stay as is?
With this additional information I can reach out internally if this is a known behavior or some unusual situation.

The above nvidia-smi output indicates that all the GPUs are correctly in P8 idle state, which means the lowest realistic power state is reached. But the additional ~6W do not justify the extra 10C temperature. So my suspicion is that the fans are running with higher RPMs and causing the higher idle power consumption.

Thanks!

vans554 · June 28, 2022, 2:13pm

What kind of enclosure are you using
Air enclosure
what brand(s) are the GPUs

18:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ZOTAC International (MCO) Ltd. GA106 [GeForce RTX 3060 Lite Hash Rate]
    Physical Slot: 5

51:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. Device 3658
    Physical Slot: 3

8a:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ZOTAC International (MCO) Ltd. GA106 [GeForce RTX 3060 Lite Hash Rate]
    Physical Slot: 1

c3:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. Device 3658
    Physical Slot: 7

which Linux Distribution is this running on?
Ubuntu 22.04
when you observe the status as above, what are the fans doing of the respective GPUs?
not spinning / nvidia-smi has them at 0%
How long does the above status stay as is?
forever

NOTE: if I run a task to use the NVENC asic + a task to use the cuda cores. then kill both said tasks, the power at idle is even higher around 20w. And it never drops.

NOTE2: If i remove GPUs (leaving 1) or put a leaf blower to them (moves temps down to 30C) the wattage does not go down. It seems that exact GPU always can use 4w in the lower power state, whether its slotted solo, or with others.

MarkusHoHo · June 29, 2022, 12:33pm

Thanks for the details!

I was hoping you might have a homogeneous set of GPUs, the mixed setup of different manufacturers and different GPUs (LHR vs non-LHR) will make it difficult to resolve this. I will see if I can find internal resources with more information.

Do you see the “misbehaving” GPUs ever be in a state with lower idle power? For example after boot?

Did you check with EVGA or Zotac support already? Might be worth contacting them to see if this is a known issue with them.

vans554 · June 30, 2022, 6:56pm

I plugged in a single GPU into the main slot, rebooted + kept temps really down by putting a giant fan in front of it.
As you can see there is no difference.

As soon as we boot up

root@node1:~# nvidia-smi
Thu Jun 30 12:57:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:C3:00.0 Off |                  N/A |
|  0%   33C    P0    41W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

5-10 seconds after boot

root@node1:~# nvidia-smi
Thu Jun 30 12:57:27 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:C3:00.0 Off |                  N/A |
|  0%   32C    P8    14W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

30 minutes after boot

root@node1:~# nvidia-smi
Thu Jun 30 12:57:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:C3:00.0 Off |                  N/A |
|  0%   32C    P8    13W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

vans554 · August 3, 2022, 4:30am

@MarkusHoHo Any idea what can be wrong?

MarkusHoHo · August 10, 2022, 1:25pm

Hello again,

Sadly I could not get any suggestions beyond what I stated, that it might be related to manufacturer specifics on the GPU and that they are simply stuck with 10-14W idle power consumption. The GPU is in the lowest power state, so it should consume less, but finding the reason for it without access to the HW is not possible. So if this is a big concern for you, you should contact the OEM or the point of sale.

val.zapod.vz · October 7, 2022, 5:46am

It is a common problem on linux. See #9951 (OpenEncodeSessionEx failed: out of memory (10): (no details)) – FFmpeg

github.com/NVIDIA/open-gpu-kernel-modules

High GPU power consumption when using NVDEC

opened 03:59PM - 24 Jul 22 UTC

thesword53

bug

### NVIDIA Open GPU Kernel Modules Version 515.57 ### Does this happen with th…e proprietary driver (of the same version) as well? Yes ### Operating System and Version Arch Linux ### Kernel Release 5.18.14-arch1-1 ### Hardware: GPU GeForce RTX 2080 SUPER ### Describe the bug GPU consumes on average 61W when I use NVDEC and only 35W with VDPAU. I also noticed the GPU clock stays at 1650MHz with NVDEC. With VDPAU, the GPU clock is around 950MHz. ### To Reproduce `mpv --hwdec=nvdec video.mp4` for NVDEC `mpv --hwdec=vdpau-copy video.mp4` for VDPAU ### Bug Incidence Always ### nvidia-bug-report.log.gz [nvidia-bug-report.log.gz](https://github.com/NVIDIA/open-gpu-kernel-modules/files/9176112/nvidia-bug-report.log.gz) ### More Info _No response_

2shrestha22 · October 17, 2022, 1:47am

I would rather say a common problem of Nvidia on Linux.

Topic		Replies	Views
Idle power usage problem [GTX 1060 6GB] Linux	8	5118	December 28, 2018
Idle power usage problem (P8) after debian driver distupgrade 470->525 [RTX 3090] Linux power , nvidia-smi , linux-driver	5	2896	September 12, 2023
High idle power draw RTX 2070 Super Linux	14	5289	December 7, 2024
Ubuntu 20.04 - NVIDIA GPU consuming power even when using only integrated graphics card (Intel iGPU) Linux	40	9697	December 21, 2022
GeForce RTX 2070 MaxQ heats up when in idle on MSI GS65 8SF with Ubuntu 19.04 and nvidia driver 430.... Linux	18	1862	October 12, 2021
High IDLE power usage on 3070 Max-Q without any processes running on it Linux power , wayland , nvidia-smi	16	2333	April 3, 2023
[BUG Report] Idle Power Draw is ASTRONOMICAL with RTX 3090 Linux nvbugs	85	19543	December 11, 2024
Headless linux Rtx 2060 never enters Idle Linux power	12	1358	March 30, 2020
SW Power Cap always Active CUDA Programming and Performance	5	9529	April 29, 2020
Clock Throttle Reasons RTX 3060 mobile - Idle - active? GPU - Hardware	9	3439	February 2, 2023

Idle power usage stuck at 10-20watts after running an app

Related topics