I have a machine running Ubuntu 18.04 server edition without any displays attached I use for deep learning, with two RTX 2080 Ti GPUs.
Whereas before the machine was drawing 90W at the wall while idling and nvidia-smi reported the GPUs at 0%, now it is drawing 150-220W and nvidia-smi reports the GPUs at around 30%. No configuration changes.
Cuda drivers v10.0. I can’t upgrade because I am using Tensorflow 1.14, and in any case things were working fine before with the exact same drivers.
I am not an expert but why are my GPUs using so much power when not in use? Any way I can configure them to stop doing that?
PS - No my drivers are not 11.2 as reported by nvidia-smi. nvcc --version correctly reports v10.0
I am running around like a headless chicken how did you guess? :)
Anyway nvidia-persistenced was already running as per
sudo systemctl status nvidia-persistenced
and
sudo nvidia-smi -pm 1
But it turns out the startup command for the service had a --no-persistence-mode flag so I edited /lib/systemd/system/nvidia-persistenced.service and changed it to
That may get overridden by a system update.
Better to create a folder /etc/systemd/system/nvidia-persistenced.service.d and put a custom override.conf with your ExecStart definition into it.