I have 4 Titan X (Pascal) used for compute (training neural nets at 100% utilization for days or weeks).
The server is a (home-built) DIGITS DevBox, in the high-air-flow Corsair 540 case.
The Titans routinely run at 84-86 degrees C, but the fan stays at 50-60% duty cycle. I know this is, technically, within the thermal limit of the card. But running 4 cards that hot for months on end has me worried - they are not cheap to replace.
Because the machine is headless, nvidia-settings isn’t very useful (no X server).
How can I set the fans to a more aggressive profile? In a way that persists across reboot? I hooked up a monitor, which let me use nvidia-settings to set ONE card’s fan to 100%. That dropped the GPU to 65 C, but the others are still hot.
I’ve tried the command-line:
#!/bin/bash
nvidia-settings
-a “[gpu:0]/GPUFanControlState=1”
-a “[fan:0]/GPUCurrentFanSpeed=40” &
but always get:
** (nvidia-settings:32159): WARNING **: Could not open X display
ERROR: The control display is undefined; please run nvidia-settings --help
for usage information.