NX boot failed

Hi, Expert

NX is in production in our product. With a small possibility, NX boot failed. Attached is log, please help to check, what factors can lead to such issue?

Thanks
Peter
nx_error.log (106.4 KB)

hello zhuyonghui,

it seems you’re stuck at bootloader stage and having failure to load SMD,
for example,

[0000.085] E> LOADER: Failed to verify SMD.
[0000.089] I> Primary SMD copy is invalid, try with secondary copy..
[0000.095] E> LOADER: Failed to verify SMD.
[0000.099] E> LOADER: Failed to verify SMD & SMD_b.
[0000.103] E> Error: SMD: Load failed
[0000.107] E> Load SMD failed
...

may I know what’s the modification you had done, also, which JetPack release you’re working with?
thanks

Jerry,

We are using Jetpack 4.4.1. We haven’t made any change to bootloader code. Did you run into such issue before? What factors may lead to such issue?

Thanks
Peter

hello zhuyonghui,

no, I don’t see this failure before.
what’s the exactly steps to reproduce this issue, may I also know what’s the error rate,
thanks

Hi Jerry,

I saw there were several exactly same cases in forum.

But I didn’t see the clear root cause and solution for this.

We shipped NX based system to customers. During factory test, it’s all ok. But after the system shipped to customer, customer powered on the system, and found the system can not boot.

Thanks
Peter

hello zhuyonghui,

it looks like re-flash the board could resolve the issue.
could you please refer to Multiple jetson nx (core boards on the development kit) cannot be started - #13 by cbstryker.

there’s known issue when opening boot device to write SMD partitions. there’re patches to improve SMD update sequence, which could fix the issue,
however, these changes are including to r32.5.1, are you able to upgrade to the latest release for confirmation?
thanks

Hi Jerry,

Reflash could reslove the issue. We are worry that the same issue may happen again.

In our application, we shouldn’t write SMD partitions after flash. Do you mean this patch improved SMD partition reliability during first initialization? Could you forward us the known issue doc?

Thanks
Peter

hello zhuyonghui,

sorry, that’s no such documentation to cover that.
this issue might happened if you power-off the system after power-on within 1-minutes.

here’s the reason…
you may also look into bootloader messages,
for example,

QSPI: erasing sectors from 176 – 176   <== SMD_a
QSPI: erasing sectors from 177 – 177   <== SMD_b

in the r32.4 CBoot flow, we always update primary SMD first (i.e. SMD_a), there will have a risk when power off in the SMDs updating.

please upgrade to the latest release, (i.e. JetPack-4.5.1)
with the change update, it’ll update the un-used SMD partition first (i.e. SMD_b), hence will be the fail-safe update.
thanks

Hi Jerry,

Thank you very much. It’s not reasonable to erase QSPI during boot up. It will be apt to fail if abnormal power-off occurs.

We will update to 4.5.1 for testing.

Thanks
Peter

Any update on the root cause of this issue? In our custom carrier board we are experiencing the same failures after a seemingly random number of power cycles. We have experienced this on jetpack 4.5.1 with both the production xavier nx (without the sd card) and the dev kit version. We can recover the Xaviers by flashing with SDK Manager, however, we are more interested in finding a root cause of the issue at this point. Below are logs pulled from both model xavier nx in our custom carrier board.

No sd-card Xavier failure:

[0000.025] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.034] I> MB1 (prd-version: 1.5.1.6-t194-41334769-1740dd39)
[0000.039] I> Boot-mode: Coldboot
[0000.042] I> Chip revision : A02P
[0000.045] I> Bootrom patch version : 15 (correctly patched)
[0000.050] I> ATE fuse revision : 0x200
[0000.054] I> Ram repair fuse : 0x0
[0000.057] I> Ram Code : 0x0
[0000.059] I> rst_source : 0x0
[0000.062] I> rst_level : 0x0
[0000.066] I> Boot-device: QSPI
[0000.068] I> Qspi flash params source = brbct
[0000.072] I> Qspi using bpmp-dma
[0000.075] I> Qspi clock source : pllp
[0000.079] I> QSPI Flash Size = 32 MB
[0000.082] I> Qspi initialized successfully
[0000.086] E> No bootable slot found
[0000.089] E> LOADER: Failed to get slot for boot chain from SMD.
[0000.095] E> LOADER: Failed to get storage info for binary 0 from loader.
[0000.101] E> LOADER: Failed to get info for binary 0 from loader.
[0000.107] C> LOADER: Could not read binary 0.
[0000.111] C> Fail to load mb1-bct bin
[0000.115] E> Task 24 failed (err: 0x1d540102)
[0000.119] E> Top caller module: LOADER, error module: AB_BOOTCTRL, reason: 0x02, aux_info: 0x01
[0000.127] I> MB1(1.5.1.6-t194-41334769-1740dd39) BIT boot status dump :
0000000000011111111110111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
[0000.157] I> Reset to recovery mode

Devkit model Xavier Failure:

[0000.024] W> RATCHET: MB1 binary ratchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.033] I> MB1 (prd-version: 1.5.1.6-t194-41334769-1740dd39)
[0000.038] I> Boot-mode: Coldboot
[0000.041] I> Chip revision : A02P
[0000.044] I> Bootrom patch version : 15 (correctly patched)
[0000.049] I> ATE fuse revision : 0x200
[0000.053] I> Ram repair fuse : 0x0
[0000.056] I> Ram Code : 0x0
[0000.058] I> rst_source : 0x0
[0000.061] I> rst_level : 0x0
[0000.065] I> Boot-device: QSPI
[0000.067] I> Qspi flash params source = brbct
[0000.071] I> Qspi using bpmp-dma
[0000.074] I> Qspi clock source : pllp
[0000.078] I> QSPI Flash Size = 32 MB
[0000.081] I> Qspi initialized successfully
[0000.085] E> LOADER: Failed to verify SMD.
[0000.089] I> Primary SMD copy is invalid, try with secondary copy..
[0000.095] E> LOADER: Failed to verify SMD.
[0000.099] E> LOADER: Failed to verify SMD & SMD_b.
[0000.103] E> Error: SMD: Load failed
[0000.107] E> Load SMD failed
[0000.109] E> LOADER: Failed to get slot for boot chain from SMD.
[0000.115] E> LOADER: Failed to get storage info for binary 0 from loader.
[0000.121] E> LOADER: Failed to get info for binary 0 from loader.
[0000.127] C> LOADER: Could not read binary 0.
[0000.131] C> Fail to load mb1-bct bin
[0000.134] E> Task 24 failed (err: 0x1d1d1318)
[0000.138] E> Top caller module: LOADER, error module: LOADER, reason: 0x18, aux_info: 0x13
[0000.146] I> MB1(1.5.1.6-t194-41334769-1740dd39) BIT boot status dump :
0000000000011111111110111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

I have the same issue. I am not sure if it has something to do with the fact that I am using a custom carrier board. For now it seems to me like if i am leaving the software as it is everything is fine. But when i am upgrading some packages for installing deepstream (not the kernel) then i have like three days and this issue arises.

Tristan