I have 4 Titan X (Pascal) used for compute (training neural nets at 100% utilization for days or weeks).
The server is a (home-built) DIGITS DevBox, in the high-air-flow Corsair 540 case.
The Titans routinely run at 84-86 degrees C, but the fan stays at 50-60% duty cycle. I know this is, technically, within the thermal limit of the card. But running 4 cards that hot for months on end has me worried - they are not cheap to replace.
How can I set the fans to a more aggressive profile? In a way that persists across reboot?
I hooked up a monitor, which let me use nvidia-settings to set ONE card’s fan to 100%. That dropped the GPU to 65 C, but the others are still hot (Enable GPU Fan Settings is not available for the others).
I’ve tried the command-line:
-a “[fan:0]/GPUCurrentFanSpeed=40” &
but always get:
** (nvidia-settings:32159): WARNING **: Could not open X display
ERROR: The control display is undefined; please run
nvidia-settings --help for usage information.