Jetson TX2 fails to boot

Hi,

A few modules being used on custom boards are now failing to boot. Not even the welcome message from MB2 is seen on the console. The only messages seen are:

[0000.040] C> Failed to update boot chain
[0000.043] C> ERROR: Highest Layer Module = 0x54, Lowest Layer Module = 0x54,
Aux Info = 0x1, Reason = 0x2

We are using R28.2.1.

We suspect that this may be caused by power cycling early in the boot sequence.

Thanks,
Cliff

Please check if those problematic modules are D02 rev.

The modules affected are D00 modules.

Can you use rel-28.4 or rel-32.5.1 to flash this board first? Just a test.

The problem is not that the modules cannot be re-flashed. The problem is to do with reliability of units using these modules in the field!

Sorry. Forgot the detail from your first comment.

Are you able to find out the cause of this issue? Or some statistic to share? Like how many D00 TX2 have such issue. And how frequently is this issue happened.

Honestly, I don’t have much suggestion for it now.
There are some mechanism to enhance the reliability. However, they are all on the rel-32.x release. Rel-28.2.1 is kind of old release.

So far, 2 D00 modules have shown this problem. One module experienced a problem during power supply tests - these tests are perhaps a little extreme and this failure mode is perhaps unlikely to be experienced in the field. However, the other unit was on long term testing with a regular power supply. The power supply became damaged, and is possibly the source of the problem. But in any case, we believe there is a higher possibility that this failure mode could be replicated in the field and we are interested in minimising/eliminating this possibility.

Can you also paste the full log?

I don’t have a log. The unit won’t boot so recovery of /var/log messages is not possible.

Then how did you get the log in comment #1? I need the full log from uart. Is that the full log?

Yes, that console output is the full log with the unit unable to boot. Those messages do repeat periodically though.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Is it possible to use devkit and do the same power cycle test to see if you can reproduce such issue?