Multiple GPUs and Fans

Hi. I have two GTX 1080ti on an Ubuntu 18.04 box, both of them Founder’s edition. I use them mainly for training neural networks.

Now, I essentially have two problems:

  1. Setting the coolbits (even with --enable-all-gpus) lets me set fan speed and clocks just for the GPU which is attached to the monitor

  2. I’d like not to set the fan speed statically: rather, I’d like to set a dynamic profile, %fanspeed vs temperature. Mind that when in automatic mode, under load one 1080ti regularly hits 89-90C, no matter the throttling and the fact that the case is roomy… (the other 1080ti stays cooler… I think that not all the gpus are created equal).

Informations about my config:

inxi -b
System:    Host: nimrod Kernel: 4.15.0-46-generic x86_64 bits: 64
           Desktop: Xfce 4.12.3 Distro: Ubuntu 18.04.2 LTS
Machine:   Device: desktop Mobo: FUJITSU model: D3128-B2 v: S26361-D3128-B2 serial: N/A
           UEFI: FUJITSU // American Megatrends v: V4.6.5.4 R1.8.0 for D3128-B2x date: 06/28/2018
CPU:       10 core Intel Xeon E5-2680 v2 (-MT-MCP-) speed/max: 2269/3600 MHz
Graphics:  Card-1: Advanced Micro Devices [AMD/ATI] Park [Mobility Radeon HD 5430]
           Card-2: NVIDIA GP102 [GeForce GTX 1080 Ti]
           Card-3: NVIDIA GP102 [GeForce GTX 1080 Ti]
           Display Server: x11 (X.Org 1.19.6 )
           drivers: modesetting,nvidia,ati,radeon,nouveau (unloaded: fbdev,vesa)
           Resolution: 2560x1080@60.00hz
           OpenGL: renderer: GeForce GTX 1080 Ti/PCIe/SSE2
           version: 4.6.0 NVIDIA 415.27
Network:   Card: Intel 82579LM Gigabit Network Connection (Lewisville)
           driver: e1000e
Drives:    HDD Total Size: 2262.5GB (9.5% used)
Info:      Processes: 413 Uptime: 10 min Memory: 3677.2/96560.4MB
           Client: Shell (bash) inxi: 2.3.56

Nvidia-smi:

Mon Mar 25 04:19:30 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27       Driver Version: 415.27       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   39C    P8    10W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:04:00.0  On |                  N/A |
| 31%   57C    P0    69W / 250W |    204MiB / 11176MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1465      G   /usr/lib/xorg/Xorg                           201MiB |
+-----------------------------------------------------------------------------+

And finally my xorg.conf

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 415.27

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:3:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:4:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Note that the coolbits are set for both of them.

Can you help me?

Thanks! :)

With your config, the second fan should be visible using
DISPLAY=:0.1 nvidia-settings -q fans
In order to have dynamic fan control, you would have to use something like https://github.com/foucault/nvfancontrol
but this is only for one fan, idk if something exists to control more than one fan.

Thanks for your reply.

By

DISPLAY=:0.1

you mean setting the DISPLAY env var?

Besides, I’m rather stunned in observing that in 2019 you cannot dynamically control two nvidia card’s fans at the same time… :(

Alternatively, you can also use
nvidia-settings -c :0.1 -q fans
or, to assign values,
nvidia-settings -a {DISPLAY}/{attribute name}[{display devices}]={value}