RTX 2080 Ti power usage

Hello,

I have a machine running Ubuntu 18.04 server edition without any displays attached I use for deep learning, with two RTX 2080 Ti GPUs.

Whereas before the machine was drawing 90W at the wall while idling and nvidia-smi reported the GPUs at 0%, now it is drawing 150-220W and nvidia-smi reports the GPUs at around 30%. No configuration changes.

Cuda drivers v10.0. I can’t upgrade because I am using Tensorflow 1.14, and in any case things were working fine before with the exact same drivers.

I am not an expert but why are my GPUs using so much power when not in use? Any way I can configure them to stop doing that?

PS - No my drivers are not 11.2 as reported by nvidia-smi. nvcc --version correctly reports v10.0

If you’re running headless, you need to correctly set up and run nvidia-persistenced.

I am running around like a headless chicken how did you guess? :)

Anyway nvidia-persistenced was already running as per

sudo systemctl status nvidia-persistenced

and

sudo nvidia-smi -pm 1

But it turns out the startup command for the service had a --no-persistence-mode flag so I edited /lib/systemd/system/nvidia-persistenced.service and changed it to

ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --verbose

And voila, it is back to 90W!

Thank you!

That may get overridden by a system update.
Better to create a folder /etc/systemd/system/nvidia-persistenced.service.d and put a custom override.conf with your ExecStart definition into it.

Right you are, and so it did, get overridden that is. I would just add that it is perhaps simpler to do

sudo systemctl edit nvidia-persistenced.service

which creates an override.conf in the right place and opens it in nano, and in which I wrote

[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --verbose

Note that the second line is necessary which is probably one of the stupidest things about linux I have seen.