TX1 crash from SD card booting

Ray0420 · December 14, 2017, 4:51pm

Hi,

We run our TX1 from SD card currently.
We are testing the system at low temperature(-15 degree) environment now, and the TX1 always occur suddenly shut down.
When this issue happened TX1 no longer restart, the print message from debug uart will stop print immediately, the TX1 syslog recording in /var/log/ also have the same phenomenon.
Below are some experiments we have done.
1.Pulling out the SD card when TX1 was running → the TX1 still print message from debug uart, so it’s a different case.
2.Monitor the TX1 power, It seems normal when this issue happened.

Our TX1 running in L4T 24.2.1.
Is there any signal we should monitor or any aspect we could pay attention to?

Thanks for any advice

JimWang · December 15, 2017, 2:46am

Hi,
May we know what kind of SDcard you are using?
How is that SDcard low temperature character？
And what kind of testing you are running in -15c? Just boot to kernel and put system in -15C chamber?

Ray0420 · December 15, 2017, 6:43am

Hi JimWang,

1.I attach the link as below for this 32G SD card,
http://www.adata.com/en/feature/203
2.The SD card operation temperature ranges from -25 degree to 85 degree.
3.Basically, we just run the wifi function to monitor the TX1 status wirelessly.

Thanks
Wilsons

JimWang · December 15, 2017, 6:56am

Hi,
Does your carrier board exactly same with Jetson one on SDCARD part?
Do you add any SD_data/clk/cmd external pull up on carrier board?
If you issue related with SDCard low temp feature, suggest do some R/W test in -15C to see if it related with booting or not.

Ray0420 · December 18, 2017, 3:43am

Hi JimWang,

Does your carrier board exactly same with Jetson one on SDCARD part?
Ans:YES, exactly match the Jetson Tx1 EVM reference design. (Doc: P2597-B04)

Do you add any SD_data/clk/cmd external pull up on carrier board?
Ans::No, we did not have any pull-up. Same with EVM design ,P2597-B04.

Actually, We have ever do another exercise that pulls out the SD card when TX1 system is running, the debug message still keep printing some messages(I copy the repeated section as below), but In our case which system immediately shut down, the debug message just stop printing right now, so we thought the strange action in SD card may not directly cause this issue.

Thanks

JimWang · December 18, 2017, 4:30am

So your issue actually is some kind of suddenly shut issue?
Do you use AC adaptor or battery？ Is there possible power drop happen during your test make this fail？Otherwise system should still keep log print unless thermal or over current, monitor your VDD_IN to see if can find any related.

Ray0420 · December 19, 2017, 7:44am

Hi JimWang,

We are still trying the exercise as you advise.
By the way, we monitor the TX1 temperature message at the same time, and record as below.
In this case, We have already put the TX1 in -15 degrees over 2 hours, so the environment temperature should be balanced.
AO_threm = 4000
CPU_therm = 0
GPU_therm = -2000
PLL_therm = -3000
PMIC_Die = 100000

Here I list some question:
1.What’s A0 means, and is the value correct?
2.The operating temperature range from -25 to 85(JetsonTX1_Module_DataSheet_DS07224010v1.1)
The PLL temperature is over the range, is it OK?
3.Why is the PMIC temperature so high?
When such value wae recorded the system are still working normally, here we just went to confirm the meaning of such different area?

Thanks

JimWang · December 19, 2017, 8:18am

About thermal we have thermal design guide http://developer.nvidia.com/embedded/dlc/jetson-tx1-thermal-design-guide
And BSP document https://developer.nvidia.com/embedded/dlc/l4t-documentation-28-1 have description about Thermal Zones.
Can you check if your thermal zone value correct in normal operation?

Ray0420 · December 19, 2017, 8:23am

Hi JimWang,

The value in normal condition recorded as below.

AO_threm = 38500
CPU_therm = 32500
GPU_therm = 29500
PLL_therm = 29500
PMIC_Die = 100000

Thanks