We are running cold boot tests (i.e. leave the HW off in the thermal chamber set to -20degC and make sure the HW boots after reaching thermal equilibrium) and we are seeing the following error message:
[0000.701] I> NSDRAM base: 0x80000000, end: 0x82ee70000
[0000.706] I> Task: Thermal check (0x50021d55)
[0000.711] I> max_chip_limit = 105
[0000.714] I> min_chip_limit = -28
[0000.717] I> max temp read = -19
[0000.720] I> min temp read = -31
[0000.724] E> SOC_THERM: Failed to verify temp range.
[0000.728] C> Task 0x0 failed (err: 0x5f5f030b)
[0000.733] E> Top caller module: SOC_THERM, error module: SOC_THERM, reason: 0x0b, aux_info: 0x03
[0000.741] C> Boot Info Table status dump :
We saw another post about the same issue and the recommended solution on that post is to upgrade to rel-35.3.1 (refer to Jetson Orin 200T module’s minimum operating temperature is only -20℃ - #51 by WayneWWW). We went ahead and upgraded from 35.2.1 to 35.3.1 but we are still seeing the same error message.
I have a few questions related to this:
First, I see that the “min_chip_limit” is set to -28degC - is this the threshold for the min_temp_read or max_temp_read? Based on what I see, it seems like this is the threshold for the min_temp_read and the max_temp_read is closer to the ambient temperature (-20degC).
Second, why is there such a large temperature delta between the min_temp_read and max_temp_read?
My understanding is that the operating temperature specification for Jetson AGX Orin (non-industrial version) is -25degC to 80degC (please correct me if I’m wrong) so I’m expecting the unit to pass the cold boot test.