AGX Xavier Fan Speed Issue


The above chart explains how the fan speed framework operates.
And it works just as how it is explained above.
But when the system heated to range in between a trip temperature and a hysteresis and kept in that range, then the fan speed is kept at 0.
For example, I ran some stress tests on my AGX Xavier Developer Kit, which heated it to 48-50 C and rebooted it.
The fan mode was set to cool. After reboot, the kit maintained 48-50 C and the fan speed was kept at 0.
Because the temp never reached either a trip temp or hysteresis the fan speed did not change although it was well within the range in which it should have been set to 77.
The fan speed changed finally after reaching 53 C.
I was wondering if there was a solution to this issue.

What is the timing to change fan mode to cool during your test?

If I understand what you said correctly, you said you maintained the temperature at the range 48-50 and directly set the mode to “cool”, is that right?

Though the actual temperature already > 35, the fan doesn’t start at such case.

No, the fan mode was preset to cool before the test began.
So throughout the entire test, the fan mode was on cool.
I am guessing you are already aware of this situation.
In situations where the fan mode is changed while the temperature is in between one trip temperature and its hysteresis of changed fan mode, then the fan speed does not change.
Also, if the temperature is within the same range as the preceding example at the time of boot and the temperature is maintained within that range, then the fan speed is kept at zero.
In both situations, the fan speed does not match its temperature range.
This may not be a critical issue in most cases.
But what if, the temperature range is maintained above the maximun fan speed range?
What happens in such case?

I have observed the exact same thing - twice in a row. The fan would just not turn on. The AGX developer kit is being used in a medical lab environment, hence I went for manual fan control to make sure to run it at 70% when executing software that would otherwise overheat the system. I have not found a solution to this problem yet. I can also confirm that the system just reboots.

The fan mode was set to “cool” on boot through a script.

But the fan should start after it hits trip points again, right?

not if the temp on boot is above 73 on cool mode and stays above 73

Do you mean the PWN in fan may not achieve 255 but maybe only 160?

NO!! theoretically speaking, if the agx’s fan mode in on cool and the system temperature is above 73 on boot and some test or other process stresses the agx to increase the temp even more the fan would never run! that’s the point im trying to make here! also, in any other case its still an issue, my system was rebooted at temp around 50 while it was on cool for fan mode, and the temp did not decrease while sitting idle. Because the FAN WAS NOT RUNNING. So im suggesting that there should be some process that checks the system temp and sets the fans speed initially on boot and once when the fan mode is changed.

I see. Let me check if we have any method.

1 Like

Thank You! And I apologize for my terrible explaining skills. :(

May I ask which software version is that?

R32 (release), REVISION: 5.0, GCID: 25531747, BOARD: t186ref, EABI: aarch64, DATE: Fri Jan 15 23:21:05 UTC 2021

Hello @crushonkyo and @jhench,

We tried to reproduce this issue locally. But cannot reproduce it in both low temp ( ~45C) and high temp (>73) case.

The cur_pwm always give me the correct value even after I reboot device in 45C.

During your test, did you guys enable tools like jetson_clocks?

$ sudo jetson_clocks --show
was the commnad i used to check the set pwm

SOC family:tegra194 Machine:Jetson-AGX
Online CPUs: 0-7
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1267200 IdleStates: C1=1 c6=1
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu4: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu5: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu6: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu7: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
GPU MinFreq=318750000 MaxFreq=1377000000 CurrentFreq=318750000
EMC MinFreq=204000000 MaxFreq=2133000000 CurrentFreq=800000000 FreqOverride=0
Fan: PWM=0
NV Power Mode: MAXN
jetson@jetson-desktop:~ tegrastats RAM 975/15823MB (lfb 3589x4MB) SWAP 0/7912MB (cached 0MB) CPU [1%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190] EMC_FREQ 0% GR3D_FREQ 0% AO@41.5C GPU@42C Tdiode@45C PMIC@100C AUX@41.5C CPU@42C thermal@41.8C Tboard@43C RAM 975/15823MB (lfb 3589x4MB) SWAP 0/7912MB (cached 0MB) CPU [0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190] EMC_FREQ 0% GR3D_FREQ 0% AO@42C GPU@42C Tdiode@45C PMIC@100C AUX@41.5C CPU@42C thermal@41.8C Tboard@43C RAM 975/15823MB (lfb 3589x4MB) SWAP 0/7912MB (cached 0MB) CPU [0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190,0%@1190] EMC_FREQ 0% GR3D_FREQ 0% AO@41.5C GPU@42C Tdiode@45.25C PMIC@100C AUX@41.5C CPU@42C thermal@41.8C Tboard@43C RAM 974/15823MB (lfb 3589x4MB) SWAP 0/7912MB (cached 0MB) CPU [1%@1190,0%@1190,1%@1190,1%@1190,0%@1190,0%@1190,0%@1190,0%@1190] EMC_FREQ 0% GR3D_FREQ 0% AO@41.5C GPU@42C Tdiode@45C PMIC@100C AUX@41.5C CPU@42C thermal@41.8C Tboard@43C ^C jetson@jetson-desktop:~ sudo jetson_clocks --show
SOC family:tegra194 Machine:Jetson-AGX
Online CPUs: 0-7
cpu0: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu1: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu2: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu3: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu4: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu5: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu6: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
cpu7: Online=1 Governor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400 IdleStates: C1=1 c6=1
GPU MinFreq=318750000 MaxFreq=1377000000 CurrentFreq=318750000
EMC MinFreq=204000000 MaxFreq=2133000000 CurrentFreq=800000000 FreqOverride=0
Fan: PWM=0
NV Power Mode: MAXN
jetson@jetson-desktop:~$ sudo /usr/sbin/nvpmodel -q
NV Fan Mode:cool
NV Power Mode: MAXN
0

Hello,

Do you only have one xavier device to verify this issue? We tried 2 devices but don’t see such case.

So in your case, when you reboot your device while the temperature on your device is around 45C the fan PWM is set to 77?
If so, could you tell me what version you are using on both devices?

and yes currently we only have one agx xavier device

We are using jetpack 4.5.1