Jetson xavier nx boot issue

Hi,

We are using SDK 4.5.1, with the DDR patch we are able to program and use 16GB nx modules,
Everything seems to function properly after programming,
We tested one unit in an industrial stove in order to test the system under varuis temeprature conditions,
The unit worked great for a while, it was exposed to envirometal temps of 55 deg celsius,
During the test we run the system quite hard, GPU running AI and allmost 6 cpus are at 75% load,
we have prettry good colling so during the test we keep monitoing the /sys/class/thermal/thermal_zone*/temp
and the most we reach is 90.
the system ran a decent amount of time in the stove at several temps.
At one occurecne the system halted as if it was stuck and then after reboot it keeps failing to boot, we end up at initrd
and it tryies to OTA and fails and then it reboots again until it stops at initrd, when we mount both rootfs A or B we dont see any issues with the files.
If we mount one of the RootfsA/B partitions in initrd and try to run nvbootcntrl we get the following error:
./usr/sbin/nvbootctrl -t rootfs get-number-slots
null input file!
Init SMD partition failed!
null input file!
-5

As if the SMD partition got corrupted somehow

I have attached the Boot log of the system
Boot_Log.txt (171.0 KB)

Thanks
Amir

  1. Is this related to NX 16GB or even NX 8GB can reproduce this issue?

  2. It feels like the system is already not recoverable. Please flash it with flash.sh/sdkm.

Hi,

We only tested this on one 16GB module, we will try to test with 8GB version,
But what does this mean ? did the SMD partition got corrpted ?

The real point we want to know at this moment is how you reproduce this issue.

Ok, I will run the same tests on 8GB module and check if it recreates

Hi,

I have ran the heat/stress test using an 8GB NX module and the problem didnt happen again,
But still this feels a bit problematic to me if there is even a small chance that the system might lose it boot config and becomes bricked where only USB flash can recover it.

If there is any chance that this is related to RootFS A/B implementation I would like to know since I only use it as backup I dont relly need this funcitonality if it has issues.

I did some testing and from my checks it takes about a minute to erase the SPI when I flash the system:

[ 15.1684 ] tegradevflash_v2 --pt flash.xml.bin --create
[ 15.1688 ] Bootloader version 01.00.0000
[ 15.5805 ] Erasing spi: 0 … [Done]
[ 73.5868 ] Writing partition secondary_gpt with gpt_secondary_3_0.bin
[ 73.5871 ] […] 100%

And since this flash uses 64KB erase block size it should take about 110 - 120 ms to erase one block,
Our system is powered down (not in an orderly fashion but abruptly) alot and if there isnt any redundecy mecanizem in rootFS A/B implementation there is a small chance that the SMD will go bad if the power down is at the same time that the booter decided to write to the SMD partition
Is this the case ? do you have protection to the SMD when the power is tured off abruptly ?

Thanks
Amir

hello amir.s,

rootfsA/B will be malfunction.
since SMD partition holds information about the status of a slot, rootfs redundancy requires the SMD partitions to function.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.