More detailed example few posts below.
How to get information about amount of fans in the NV card?
I have a script to manual control all the GPUs in a computer, but I see that some cards models have no one, but two fans
For today I did it like that (for one fan per GPU. eg. 1080Ti):
for first cards:
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=XX
for second card:
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=XX
but for computer with my 2080 there is a bit different:
for first cards: (two fans)
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=XX -a [fan:1]/GPUTargetFanSpeed=XX
for second card:
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=XX -a [fan:3]/GPUTargetFanSpeed=XX
as You can see the numbers for fan in not the same as gpu’s numbers.
nvidia-smi gets only one fan.speed value.
Every google find site or every wiki site tells only about one-fan per gpu method and dont write anything about dual independed fans control.
Below output gives number of fans in…whole system. Identical output for 1x2080 (with dual-fan) and same identical for 2x1080 (single fans)
DISPLAY=:0 nvidia-settings -q fans
2 Fans on xxx:0
[0] xxx:0[fan:0] (Fan 0)
Has the following name:
FAN-0
[1] xxx:0[fan:1] (Fan 1)
Has the following name:
FAN-1
still to less information to write good script. any ideas? meaby read some from /sys/class/… od /sys/debug…?
We also tested locally.
One of two fans is running and the second is stopped physically in GPU, then nvidia-smi can show that fan in this GPU is 0% :)
So even nvidia-smi does not know what is the speed of both fans but only one of them (FIRST to be exact).
Maybe GPUFanTarget gives a hint? Otherwise, maybe check the nvidia-settings source to see how the fans are enumerated:
[url]https://github.com/NVIDIA/nvidia-settings[/url]
I will check it tomorrow.
Anyway nvidia-smi cli too even does not know the speed values and just shows 0% when first fan speed is 0% and the second is spinning :)
So nvidia team should fix tihs also in nvidia-smi tool as well ! :)
I will let you know afte checking
Nope is shows nothing usefull.
Nvidia just need to fix nvidia-smi tool, thats it.
I wrote to them at support email but they redirected me to write here …
Is there any way to report this to nvidia so that they would just fix nvidia-smi ?
THIS IS FOR SURE NVIDIA BUG
Two computers with RTX 2060.
One has 8 GPUs and nvidia-settings sees 8 fans = 1 system fan per GPU
Second one has 5 GPUs and nvidia-settings shows 10 fans = 2 system fans per GPU
nvidia-smi shows also the same.
Kernel does not matter, in this example we see in kernel 5 is 1 fan per gpu but we also have different computer that has the same kernel and driver and it sees 2 fans per GPU (on model RTX 2080)
miner@simpleminer:/usr/share/misc$ DISPLAY=:0 nvidia-settings -q fans | grep "\[fan" | wc -l
8
miner@simpleminer:/usr/share/misc$ uname -a
Linux simpleminer 5.0.4-smos5 #1 SMP PREEMPT Fri Mar 29 12:15:10 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
miner@simpleminer:/usr/share/misc$ lspci -vn | grep VGA
00:02.0 0300: 8086:5902 (rev 04) (prog-if 00 [VGA controller])
01:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
02:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
03:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
04:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
06:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
07:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
08:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
09:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
miner@simpleminer:/usr/share/misc$ mc -d
miner@simpleminer:/usr/share/misc$ nvidia-smi
Fri Apr 26 13:16:47 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 On | 00000000:01:00.0 Off | N/A |
| 64% 75C P2 125W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2060 On | 00000000:02:00.0 Off | N/A |
| 80% 80C P2 123W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 2060 On | 00000000:03:00.0 Off | N/A |
| 70% 77C P2 123W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 2060 On | 00000000:04:00.0 Off | N/A |
| 74% 78C P2 124W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 2060 On | 00000000:06:00.0 Off | N/A |
| 79% 80C P2 124W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 2060 On | 00000000:07:00.0 Off | N/A |
| 86% 82C P2 121W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 2060 On | 00000000:08:00.0 Off | N/A |
| 65% 75C P2 125W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce RTX 2060 On | 00000000:09:00.0 Off | N/A |
| 74% 78C P2 126W / 125W | 4407MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
miner@simpleminer:~$ DISPLAY=:0 nvidia-settings -q fans | grep "\[fan" | wc -l
10
miner@simpleminer:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 610 (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
miner@simpleminer:~$ lspci -vn | grep VGA
00:02.0 0300: 8086:5902 (rev 04) (prog-if 00 [VGA controller])
01:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
02:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
04:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
05:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
06:00.0 0300: 10de:1f08 (rev a1) (prog-if 00 [VGA controller])
miner@simpleminer:~$ name -a
Linux simpleminer 4.17.19-smos13 #1 SMP PREEMPT Wed Feb 6 12:22:10 CET 2019 x86_64 x86_64 x86_64 GNU/Linux
miner@simpleminer:~$ nvidia-smi
Fri Apr 26 13:17:22 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27 Driver Version: 415.27 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 On | 00000000:01:00.0 Off | N/A |
| 86% 85C P2 181W / 190W | 253MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2060 On | 00000000:02:00.0 Off | N/A |
| 68% 74C P2 176W / 190W | 253MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 2060 On | 00000000:04:00.0 Off | N/A |
| 86% 84C P2 182W / 190W | 253MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 2060 On | 00000000:05:00.0 Off | N/A |
| 86% 84C P2 187W / 190W | 253MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 2060 On | 00000000:06:00.0 Off | N/A |
| 81% 81C P2 183W / 190W | 253MiB / 5904MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
Ok, what is the procedure to report this to nvidia-smi developers in order to fix this ?
The best way I’ve found to relate the fans to the cards is using: /usr/bin/nvidia-settings -c :0 -q gpus --verbose
Again like stated previously it does not show the proper count but at least you can find what to control.