Software thermal management

Hey,

Jetson_TX2_Thermal_Design_Guide_v1.0.pdf, Section 4.3 TEGRA X2 RECOMMENDED OPERATING TEMPERATURE LIMIT a Software thermal management is mentioned. My question is where this software is implemented. In the Linux Kernel? In the module itself? Somewhere else? Or is it something the user should implement?

Thanks for clarifying!

Bye

Hi rober4f5p3, if you download and extract the L4T Documentation, open it in your web browser and navigate to the “Clock Frequency and Power Management” section, and you will find the details in the “Thermal Management” sub-section under there.

Additionally, you can find a code example of controlling the clocks in the ~/jetson_clocks.sh script, there are various sysfs entries that you can cat/echo to that will modify the governor behavior at runtime.

@dusty-nv’s answer gives full insights about how to deal with this, but if your use case is thermally ‘normal’, in short you won’t have to implement this by yourself, L4T will come with a scheme preventing to burn your TX2 for most cases.

Hey,

thanks for the links! They contain everything I need to know. One issue:
From Jetson_TX2_Thermal_Design_Guide_v1.0.pdf:

Tegra X2 is rated to operate at a junction temperature not-to-exceed 105 °C. Tegra X2 has hardware shutdown mechanisms that enforce this limit by automatically halting the system when this temperature is exceeded.

In the L4T docs, section Hardware Thermal Shutdown there are two similar tables (I guess one is for the TX2i) both listing 101.5°C for the temperature when hardware shutdown is performed. Which one is correct?

Bye

This script would print all SW thermal shutdowns range.

#Copy paste/run this script from the DUT prompt
#
#To view SW shutdown limits

cd /sys/class/thermal
for i in thermal_zone*;do                                                       
       cat ${i}/type                                                           
       trip=0                                                                  
       if [ ! -f ${i}/trip_point_0_type ]; then                                
               echo "No critical trips found"                                           
               echo "------"                                                   
       else                                                                    
               for j in ${i}/trip_point_*_type;do                              
                       if [ `cat ${j}` == 'critical' ];then                    
                               echo "trip#" $trip                                                                                    
                               cat ${i}/trip_point_${trip}_temp                
                               echo "------"                                   
                       fi                                                      
                       trip=$((trip+1))                                        
               done                                                            
       fi                                                                      
done

Thanks! But I was talking about the hardware thermal shutdown point (101.5 or 105°C).

robert4f5p3,

L4T developer guide already points out that 101.5 °C is a SW limitation.

"The final fail-safe the firmware provides is a hardware thermal reset or thermtrip. If the software and hardware throttling are unable to control the heat generation in the system, and the software becomes unresponsive, the Tegra system asserts the reset pin on the PMIC as the hardware shutdown mechanism."

I read this that the 101.5°C is the final hardware limitation.

For me section “Hardware Thermal Shutdown” in the L4T docs seems to be wrong. The table is listed two times and contains almost the same value as the “Software Thermal Shutdown” section (0.5°C difference).

Thanks!

robert4f5p3,

Sorry that I didn’t notice there are two sections there. I wonder why you need the exact temperature of hw shutdown. Could you share some detail for your purpose? The sw shutdown is set to 101°C as device tree indicating.

I just try to understand the power management. And it just made me wonder what’s the right value. It isn’t important.

Thanks!