Hey Jetson team,
We have a product that uses Xavier NX with a custom carrier and was initially deployed with jetpack 4.x using a custom yocto based image and leveraging meta-tegra for our jetpack compatibility and are having some issues with in-field updates; specific related to the boot chain/slot selection.
Unfortunately it has come to light that we need to update some of the boot firmware components and we’re having some troubles getting that done. We can confirm that the updated components are installed to the “_b” boot chain by using rcm mode with our private keys but we can’t seem to switch to that boot chain. MB1 appears to be fully silent (which I suppose is intended with production mode) so we can’t tell why we’re always starting in chain 0 once MB2 starts printing. Both tegra-boot-control and nvbootctrl appear to update the SMD partition in the same way (setting slot 1 to higher priority) but after reboot that is always flipped back and still marked unsuccessful.
In the default state we have:
# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: A
magic:0x43424e00, version: 3 features: 1 num_slots: 2
slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1
slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1
After we flip the boot chain we see this before rebooting:
# ./nvbootctrl set-active-boot-slot 1
# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: B
magic:0x43424e00, version: 3 features: 1 num_slots: 2
slot: 0, priority: 14, suffix: _a, retry_count: 7, boot_successful: 1
slot: 1, priority: 15, suffix: _b, retry_count: 7, boot_successful: 0
And then after rebooting we have:
# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: A
magic:0x43424e00, version: 3 features: 1 num_slots: 2
slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1
slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 0
The MB2 output that makes me suspicious the SMD info isn’t being read correctly is:
[0000.642] W> No valid slot number is found in scratch register
[0000.642] W> Return default slot: _a
[0000.642] I> Active Boot chain : 0
There are a few things we’ve confirmed when testing this:
- even if the slot 0 and 1 data is the same (as it is at the time of fusing) we cannot switch chains
- We’ve confirmed that on non-fused units the process works as expected (using an unsigned and unencrypted BUP)
- We’ve confirmed that if we dangerously update the current slot 0 with the new BUP the system can boot up successfully and we can see the changes applied
Is there something that needs to be done to switch boot chain on production fused units when the workflow works fine for non-production units? Is it possible we need to encrypt the SMD partitions with the SDK to get it to be accepted?