Idle power usage stuck at 10-20watts after running an app

Idle power usage on RTX3060 LHR is stuck at 10-20 watts after running an app, like FFMPEG using the nvenc asic or using cuda. It does not return back to 4-5 watts and the card heats up to around 50c. But sometimes a card does get into the low power idle state (4-5watts), I am not sure why.

To test run something on the GPU, stop it. Notice 15-25w while idle, then do modprobe -r nvidia_drm; modprobe nvidia_drm to reset the GPUs, notice the power is back to 10ish watts with 1 lucky GPU at 4w.

How to make all the GPUs idle at 4-5w after running workload?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:18:00.0 Off |                  N/A |
|  0%   50C    P8    14W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:51:00.0 Off |                  N/A |
|  0%   49C    P8    10W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:8A:00.0 Off |                  N/A |
|  0%   40C    P8     4W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:C3:00.0 Off |                  N/A |
|  0%   49C    P8    10W / 170W |      1MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------

Hi @vans554 and welcome back to the developer forums!

Can you share a bit more detail on your setup? For example:

  • What kind of enclosure are you using
  • what brand(s) are the GPUs
  • which Linux Distribution is this running on?
  • when you observe the status as above, what are the fans doing of the respective GPUs?
  • How long does the above status stay as is?
    With this additional information I can reach out internally if this is a known behavior or some unusual situation.

The above nvidia-smi output indicates that all the GPUs are correctly in P8 idle state, which means the lowest realistic power state is reached. But the additional ~6W do not justify the extra 10C temperature. So my suspicion is that the fans are running with higher RPMs and causing the higher idle power consumption.

Thanks!

  • What kind of enclosure are you using
    Air enclosure

  • what brand(s) are the GPUs

18:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ZOTAC International (MCO) Ltd. GA106 [GeForce RTX 3060 Lite Hash Rate]
    Physical Slot: 5

51:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. Device 3658
    Physical Slot: 3

8a:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: ZOTAC International (MCO) Ltd. GA106 [GeForce RTX 3060 Lite Hash Rate]
    Physical Slot: 1

c3:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. Device 3658
    Physical Slot: 7
  • which Linux Distribution is this running on?
    Ubuntu 22.04

  • when you observe the status as above, what are the fans doing of the respective GPUs?
    not spinning / nvidia-smi has them at 0%

  • How long does the above status stay as is?
    forever

NOTE: if I run a task to use the NVENC asic + a task to use the cuda cores. then kill both said tasks, the power at idle is even higher around 20w. And it never drops.

NOTE2: If i remove GPUs (leaving 1) or put a leaf blower to them (moves temps down to 30C) the wattage does not go down. It seems that exact GPU always can use 4w in the lower power state, whether its slotted solo, or with others.