I’m running on an AGX Xavier Developer Kit JetPack 4.3 [L4T 32.3.1] nvpmodel -m 0
Let me recap what I (think I) learned until now:
When running /usr/bin/jetson_clocks it sets /sys/devices/pwm-fan/target-pwm to the FAN_SPEED value (a number between 0 and 255), and it sets the /sys/devices/pwm-fan/temp_control to 0 (probably to disable automatic adjustment of fan speed based on temperatures?)
When the FAN_SPEED variable in jetson_clocks is 255 the fan speeds up indeed to its maximum.
When I edit /sys/devices/pwm-fan/target-pwm to a lower value I can indeed see the fan go to a lower speed, but it stays on that speed regardless the temperatures.
I was hoping when I would reset /sys/devices/pwm_fan/temp_control back to 1 and /sys/devices/pwm-fan/target-pwm to 0 I would get back automatic fan speeds based on temperatures (DVFS), but unfortunately that does not seem to be the case, in that now the fan does never turn on, even with high temperatures, up to an automatic shutdown of the Xavier.
I also tried to rerun nvpmodel -m 0, with the same result of the fan never turning on anymore.
I also can’t do a jetson_clocks --restore because I did not do a --store before running jetson_clocks.
I have now a temporary solution by setting /sys/devices/pwm-fan/target-pwm to 102 so it runs permanently at 40% which is enough most of the time, but every now and than I have to set it higher and back lower again.
Is there any way for me to re-enable DVFS so the fan speed automatically adjusts to the temperatures?
it’s true that enable jetson_clock would put the fan speed to maximum.
could you please refer to developer guide, please check Fan Mode Control chapter for fan modes configuration.
thanks
I don’t think your operation would affect the DVFS table. Would you mind re-flashing your board and see if this issue is still?
If so, could you share the tegrastats when the fan automatically starts?
If you look at the actual “jetson_clocks” script, it is human readable bash shell. The “do_fan()” function could be edited, or simply used to create a different script (if you edit don’t forget to save an original too). The fan speed is just an echo into a “/sys” file, where 255 is maxed out (and yes is unrelated to DVFS). The “auto” setting is “0”, the “max” setting is “255”, and anything in between is exactly what it would seem to be. Check this before and after running “jetson_clocks”:
sudo cat /sys/devices/pwm-fan/target_pwm
Auto would probably be fine, but if there is a temperature throttle, then DVFS would still have a momentary reduction until the fan speeds up and cools the system down again. Can you tolerate a momentary throttle? If not, then you want 255 (max), but if you can tolerate the lag between heating up and the fan cooling things back down, then 0 (auto) is good enough.
Thanks for the help. It’s a friendly neighbourhood over here.
I’m really starting to doubt myself.
After reading the documentation and applying the following settings:
nvpmodel -q output:
NV Fan Mode:quiet
NV Power Mode: MAXN
0
/sys/devices/pwm_fan/temp_control is set to 1
/sys/devices/pwm-fan/target-pwm is set 0
and putting the Xavier under load until the average temperature of the different temperature sensors goes over 50C, now the fan turns on again like it should.
I’m pretty sure I did the same thing before with no effect, but as I said, I’m starting to doubt myself.
So, for the moment my issue is solved. Thanks again for the help.
I can only guess, but the being able to control the fan is useful only when there is some other program looking at current temperature and deciding what speed the fan is at. If you were to enable auto fan, but in some way ignore or disable the program which wants to set a fan speed, then it would fail. Perhaps something decided that if the fan was on max, then the temp_control should also run differently. I have not examined the temp_control changes, nor their relation to the jetson_clocks script.
I don’t know what is going on with temperature management on my Xavier.
I left it running this night, with nothing else active than jupyter lab, but no processes really running.
When I wanted to continue work today I saw the Xavier had a panic (CPU1 unresponsive for 21 seconds) and became unresponsive, and feeling very hot from the outside (even after the panic somewhere during the night).
When I rebooted the Fan came on and the Xavier started to cool down again. It must have failed to turn on the fan however, otherwise it would not have become so hot, while the test I did yesterday showed the fan coming on when I did put a heavier load on the Xavier.
It looks like the problem I described in my original post where I noticed CPU1 going to 100% for longer times without showing any processes utilising the CPU (top).
Unless someone has an idea about what could be going wrong, I suppose I will have to reflash and reconfigure from scratch…
and never leave it switched on unattented because I don’t know what damages are caused by these high temperatures,
If you first run “nvpmodel -m 0”, but do not run “jetson_clocks” (nor anything for fan adjustment), then what do you see from:
sudo -s
cd /sys/
grep -i '.*' kernel/debug/tegra_fan/* devices/thermal-fan-est/temps
(note that “grep” will give the name of the file being monitored, so it is more convenient then “cat” when monitoring several files)
Then run jetson_clocks, followed by the same command. Post the new output under jetson_clocks (wait about 30 seconds after running jetson_clocks before you run the second grep). This should give a bit more detail on what the fan is being told to run.
FYI, if you were to log in via serial console, then you could run this and the final output would be available (visible on the PC with the serial console app) even after the system locks up:
It isn’t a good idea to purposely run a unit till it locks up due to temperature, but if you are debugging and it is going to do this anyway, then having data is a good idea. “grep” plus “watch -n 1” over serial console will make sure the data isn’t lost.
I will now reboot again, and start jupyter lab like I did yesterday, and start the watch on a console. I will report back after I let it running for night again.
I set temp_control back to 1 and target_pwm to 0.
When I now reboot the Xavier the fan goes to 40% (target_pwm 102), while all temperatures show temperatures around 30C.
Something seems completely off, and starts to look random to me. I have no idea why it now automatically sets the target_pwm to 102 when I have set it to 0 right before the reboot.
[info] [Sensor] [Temp] [Power/mW] [Cur] [Avr]
UpT: 0 days 0:5:46 AO 28.50C CPU 465 497
FAN [|||||| 40%] Ta= 40% AUX 28.50C CV 0 0
Jetson Clocks: inactive CPU 30.50C GPU 0 0
NV Power[0]: MAXN GPU 30.50C SOC 2483 2482
APE: 150MHz PMIC 100.00C SYS5V 3250 3250
HW engine: Tboard 30.00C VDDRQ 775 775
ENC: NOT RUNNING Tdiode 31.50C Total 6973 7004
DEC: NOT RUNNING thermal 29.70C
I’m glad I’m not the only one.
I also thought the OS would take over again after setting temp_control to 1, but it looks I’m getting some random behaviours now.
At the moment I’m completely puzzled.
I’m sure however with some help we can get it solved.
I just noticed you are the author of the jtop tool. Nice to meet you, and thank you for the tool. I use it a lot to monitor resource usage.
I have used the fan control before to change the fan speed manually when I noticed in jtop that the temperatures were going up without the fan speeding up automatically. I have not used it anymore after I changed the the values in temp_control and target_pwm. I suppose changing the manual speed in jtop does the same as setting the speed in target_pwm.
I also think that I used the ‘a’ option at some point. It looks likt it was the same as running jetson_clocks. I’m not sure what the ‘e’ option does, and also not what ‘CTRL=Enable’ means.
When you press the button the button ‘f’ you can change fan control, when you read ‘manual’ you set a manual speed and when your board restart will be set again the speed that you set.
To do that there is a file stored in /opt/jetson_stats/fan_config where is written the mode selected and the speed defined. (also there is jetson_fan service that read this file and enable if required the fan)
If you read Jc after the command ‘f’ the fan is controlled by jetson_clocks and there are no other service to change the speed of your fan.
To be sure that this service does not running and this file does not exist in your board, run:
sudo jtop --restore
The command ‘a’ active jetson_performance service that manage the jetson_clocks script, and the command ‘e’ enable jetson_clocks to run at boot.
I tried time ago to enable the temperature control using:
After running jtop --restore the fan turns of and stays off after a reboot.
When putting load on the Xavier after the reboot it turns on when avg. temperatures go over 50C, and the fan turns off again when it cools down.
Thanks already for the jtop --restore tip.
It looks like I’m back to normal behaviour, although I have to do some more testing to see wether I don’t run into any of the other issues I noticed before.
Is it possible jetson_stats and the os fan control are fighting each other under certain conditions? Would you rather like to continue further investigation on your repo?
Sorry for late reply. I am wondering could you guys share a brief conclusion about what you’ve done here.
I could try to reproduce it with my xavier. AFAIK, it looks like we need to set nvpmodel to mode 0 and toggle the target_pwm and temp_control. When hitting this problem, we should see the fan not work all the time, right?
I don’t have a definitive conclusion yet. After the golden tip form rbonghi to do a sudo jtop --restore, to me it looks indeed like setting nvpmodel to 0, target_pwm to 0 and temp_control to 1 switches back to automatic fan control, based on the different temperature sensors.
I can be completely wrong here, but I’m under the impression, and I hope rbonghi can shed some more light on this, that when you use the fan control options in the jetson_stats jtop tool, it implements also some kind of fan control which maybe can interfere with the OS fan control in some way.
Maybe the thing I did wrong was to use jtop fan control, and also manually set temp_control to 1 which might trigger the OS fan control to become active also, while jtop fan control is still active, leading to some random behaviour because of the 2 fan control systems being active at the same time. I’m purely guessing here. I would have to do some more testing.
As I said before, maybe rbonghi has more insights on what exactly happens when you start using the fan control options in jtop.
IMO, if you set temp_control to 1, the fan should be able to automatically go up when device is in high temperature condition. Please help me check if the fan does not work under this setting.