问题:风扇根据温度自动调节

NVIDIA工程师,您好:
我在测试风扇遇到了一个问题:

风扇转速经常会出现不跟随CPU/GPU温度变化问题(重新上电10次测试,大概有一两次是正常的:风扇转速跟随温度变化)

手动控制风扇转速是正常的:

Is this result on AGX Orin devkit or with your custom carrier board?
Which JetPack SW?

1 Like

please check the status of systemd service nvfancontrol.

Hello
my custom carrier board

Jetson R35.4.1
Jetson AGX Orin

請你附上實際碰到問題時候的以下資料
sudo dmesg
sudo nvfancontrol -q
cat /etc/nvfancontrol.conf
sudo tegrastats
grep “” /sys/class/hwmon/hwmon*/{pwm1,rpm}

抱歉回复晚了,
这是dmesg
dmesg.txt (122.7 KB)

nvidia@EAORA07A:~$ sudo nvfancontrol -q
FAN1:FAN_PROFILE:cool
FAN1:FAN_GOVERNOR:cont
FAN1:FAN_CONTROL:close_loop

nvidia@EAORA07A:~$ cat /etc/nvfancontrol.conf

#Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.

POLLING_INTERVAL 2

<FAN 1>
TMARGIN ENABLED
FAN_GOVERNOR pid {
STEP_SIZE 10
}
FAN_GOVERNOR cont {
STEP_SIZE 10
}
FAN_CONTROL close_loop {
RPM_TOLERANCE 100
}
FAN_PROFILE cool {
#TEMP HYST PWM RPM
0 0 255 2900
10 0 255 2900
11 0 215 2440
30 0 215 2440
60 0 66 750
115 0 66 750
}
FAN_PROFILE quiet {
#TEMP HYST PWM RPM
0 0 255 2900
10 0 255 2900
11 0 171 1940
23 0 171 1940
60 0 66 750
115 0 66 750
}
THERMAL_GROUP 0 {
GROUP_MAX_TEMP 115
#Thermal-Zone Coeffs Max-Temp
CPU-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
GPU-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
SOC0-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
SOC1-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
SOC2-therm 20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0
}
FAN_DEFAULT_CONTROL close_loop
FAN_DEFAULT_PROFILE cool
FAN_DEFAULT_GOVERNOR cont
KICKSTART_PWM 64

nvidia@EAORA07A:~$ sudo tegrastats
03-28-2023 01:56:17 RAM 32934/62803MB (lfb 6843x4MB) SWAP 0/31401MB (cached 0MB) CPU [100%@1804,99%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804] EMC_FREQ 15%@665 GR3D_FREQ 88%@[408,305] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.875C Tboard@80C SOC2@93.218C Tdiode@84.75C SOC0@92.843C CV1@-256C GPU@91.531C tj@98.875C SOC1@93.093C CV2@-256C VDD_GPU_SOC 4325mW/4325mW VDD_CPU_CV 9366mW/9366mW VIN_SYS_5V0 4784mW/4784mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1594mW/1594mW NC 0mW/0mW
03-28-2023 01:56:18 RAM 32937/62803MB (lfb 6843x4MB) SWAP 0/31401MB (cached 0MB) CPU [96%@1804,99%@1804,100%@1804,100%@1804,100%@1804,99%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804,100%@1804] EMC_FREQ 5%@2133 GR3D_FREQ 38%@[407,407] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.625C Tboard@80C SOC2@93.343C Tdiode@85C SOC0@92.812C CV1@-256C GPU@91.687C tj@98.625C SOC1@93.218C CV2@-256C VDD_GPU_SOC 4566mW/4445mW VDD_CPU_CV 9606mW/9486mW VIN_SYS_5V0 4984mW/4884mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1694mW/1644mW NC 0mW/0mW
03-28-2023 01:56:19 RAM 32947/62803MB (lfb 6842x4MB) SWAP 0/31401MB (cached 0MB) CPU [100%@1958,100%@1958,100%@1958,100%@1958,100%@1958,99%@1958,100%@1958,100%@1958,100%@1958,100%@1958,100%@1958,100%@1958] EMC_FREQ 5%@2133 GR3D_FREQ 91%@[509,509] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.906C Tboard@80C SOC2@93.343C Tdiode@84.75C SOC0@92.968C CV1@-256C GPU@91.812C tj@98.781C SOC1@93.125C CV2@-256C VDD_GPU_SOC 4806mW/4565mW VDD_CPU_CV 10567mW/9846mW VIN_SYS_5V0 4984mW/4917mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1794mW/1694mW NC 0mW/0mW
03-28-2023 01:56:20 RAM 32950/62803MB (lfb 6841x4MB) SWAP 0/31401MB (cached 0MB) CPU [98%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881] EMC_FREQ 5%@2133 GR3D_FREQ 81%@[509,509] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.562C Tboard@80C SOC2@93.437C Tdiode@84.75C SOC0@92.75C CV1@-256C GPU@92.031C tj@98.562C SOC1@93.281C CV2@-256C VDD_GPU_SOC 5046mW/4685mW VDD_CPU_CV 10567mW/10026mW VIN_SYS_5V0 4984mW/4934mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1794mW/1719mW NC 0mW/0mW
03-28-2023 01:56:21 RAM 32949/62803MB (lfb 6841x4MB) SWAP 0/31401MB (cached 0MB) CPU [95%@1881,100%@1881,99%@1881,99%@1881,100%@1881,100%@1881,100%@1881,100%@1881,100%@1881,99%@1881,100%@1881,100%@1881] EMC_FREQ 5%@2133 GR3D_FREQ 86%@[509,509] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.812C Tboard@80C SOC2@93.406C Tdiode@85C SOC0@93.031C CV1@-256C GPU@91.718C tj@98.968C SOC1@93.156C CV2@-256C VDD_GPU_SOC 4806mW/4709mW VDD_CPU_CV 10086mW/10038mW VIN_SYS_5V0 4984mW/4944mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1694mW/1714mW NC 0mW/0mW
03-28-2023 01:56:22 RAM 32963/62803MB (lfb 6838x4MB) SWAP 0/31401MB (cached 0MB) CPU [96%@1958,100%@1958,100%@1958,99%@1958,100%@1958,100%@1958,100%@1958,100%@1958,99%@1958,100%@1958,100%@1958,100%@1958] EMC_FREQ 5%@2133 GR3D_FREQ 90%@[509,509] VIC_FREQ 729 APE 174 CV0@-256C CPU@98.906C Tboard@80C SOC2@93.281C Tdiode@84.75C SOC0@92.968C CV1@-256C GPU@91.75C tj@98.75C SOC1@93.156C CV2@-256C VDD_GPU_SOC 4566mW/4685mW VDD_CPU_CV 10326mW/10086mW VIN_SYS_5V0 4984mW/4950mW NC 0mW/0mW VDDQ_VDD2_1V8AO 1694mW/1710mW NC 0mW/0mW
^C

nvidia@EAORA07A:~$ grep “” /sys/class/hwmon/hwmon/{pwm1,rpm}*
/sys/class/hwmon/hwmon4/pwm1:101
/sys/class/hwmon/hwmon0/rpm:0

Hi,

確認以上資料之後問題應該是在

  1. /sys/class/hwmon/hwmon0/rpm:0

這裡顯示RPM是0 . 在RPM讀不到的情況下, nvfancontrol只能用open_loop,但是你的conf裡面還在使用 close_loop導致問題

詳細文件:
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance/JetsonOrinNanoSeriesJetsonOrinNxSeriesAndJetsonAgxOrinSeries.html#fan-profile-control

  1. 至於為何rpm不見可能跟你用自己做的底板有關, 改成用別的底板之後tachometer相關的設定在device tree裡面不能用了.

Hi,
感谢你的答复。
我将FAN_DEFAULT_CONTROL换成open_loop问题得到了改善。

有个疑问就是rpm不是一直不可见的,我多次掉电上电测试,大概十次会有两三次是正常的。

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

請問你這個rpm的現象在NV devkit上看得到嗎? 我必須得先區分這是你們板子才會碰上的問題還是我這裡也會發生