fan speed without X (headless) : powermizer drops card to p8

I’m running CUDA jobs on a Titan X on a system with no monitor attached. I would like to manually increase the fan speed to maintain reasonable temperatures because I will be running calculations constantly for weeks at a time.

I managed to get the hack for a dummy x server for setting the fan speed to work.

I followed the hack here: https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness

Basically, started up a dummy X windows so that I could use nvidia-settings commandline controls. I set the fan speed:

nvidia-settings -c :0 -a [gpu:0]/GPUFanControlState=1
nvidia-settings -c :0 -a [fan:0]/GPUTargetFanSpeed=70

and I set the driver to persistence mode

nvidia-smi -pm 1

then I quit the dummy X window server. Things run great until my first CUDA job finishes. Then the powermizer kicks in and throttles back from P2 to P8. But, it doesn’t throttle back up to P2 when I start the next job. If I restart the dummy x-server then it throttles up to P2 again. But, I don’t want to keep the dummy x-server running because it uses memory and I think it also slightly slows down the CUDA calculation.

If I don’t do the hack at all, and just run my jobs on the headless machine, then powermizer behaves sensibly (throttles up and down appropriately). (Of course this is not good because the fan speed stays stuck around 37% and I get to about 80+ C.

In a nutshell, it seems that manually setting the fan, and exiting X causes the powermizer to behave incorrectly.

Any ideas? This is with 349.16

The exact same thing happens on Titan X Pascal with cuda 8rc driver version 367 on ubnutu 16.04.01 x86_64. Has anyone found a solution in the 1.5 years since the post above?

Also affecting us - any fix?

The following script works for me:

https://github.com/boris-dimitrov/set_gpu_fans_public/blob/master/cool_gpu

It is based on prior work by Axel Kohlmeyer at https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness which required a few small tweaks on my particular version of ubuntu and driver.

1 Like

I tried boris_dimitrov script. How every I manage to use nvidia-smi -pl 85 to lower power usage of my gtx1060 for crypto mining.

If you want to leave the power usage limit use nvidia-smi -pm 1.

I was mining and my GTX1060 on linux headless server was at 85C in a minute or two, chaging the limit power reduced temps a lot.

If you are doing computing like simulations lowering the power limit could not be an option and you are force to install X :(