`nvbootctrl get-current-slot` returning unexpected slot

Hello all,

I’m building an embedded linux system based on Jetson TX2 and Yocto, and am using Mender as OTA update solution. I’m currently facing issues with units occasionally failing updates and have traced back the issue to nvbootctrl get-current-slot returning what seems to be the wrong slot.

On correctly functioning units, the partition layout is as follows:
Slot 0: /dev/mmcblk0p1
Slot 1: /dev/mmcblk0p33

Every now and then, nvbootctrl get-current-slot seems to get this mapping exactly the other way round. On units which run on /dev/mmcblk0p1 (as reported by findmnt /) it reports “Slot 1” and vice versa.

As this is different to what the Mender update client expects, Mender rolls back the update.

  • How does nvbootctrl determine the current slot?
  • Where is the mapping between the partitions and the boot slots defined? I read the relevant parts in the documentation, but it seems quite sophisticated and doesn’t mention any partition numbers. I didn’t yet grok how this all works together.

I’m using L4T 32.3.1.

Thanks in advance,
Manuel

The Mender integration is a bit complicated on the TX2 when U-Boot is in the boot chain. Mender uses U-Boot variables to switch between the two rootfs partitions, and that’s a separate mechanism from the A/B slot mechanism in the NVIDIA bootloader chain. The tegra-specific Mender state scripts try to keep the two in sync and automatically re-sync them when possible, but there may still be some cases they don’t handle.

You might want to start a thread over on the Mender forum about this.

Thanks, will do that.

Can you recommend not using U-Boot on the Tegra? Would there be any downsides?

Due to the following comment on meta-mender-community I figure the alternative would be to use a cboot-only solution:

# We have a different install script for U-Boot vs. cboot, since
# the mechanism for determining boot partitions is different between
# the two, and with cboot there is no U-Boot environment for copying
# the machine-id.

However, if I understood the NVidia documentation right, even when using U-Boot, cboot is getting used underneath somehow.

Can you recommend not using U-Boot on the Tegra? Would there be any downsides?

As mentioned in that comment you quoted, propagating the machine ID across updates can be a bit of a problem in that case, but it can be made to work. I have dropped U-Boot on a couple of TX2-based projects.

even when using U-Boot, cboot is getting used underneath somehow.

That’s correct. cboot is always there. It just hasn’t been very customizable on the TX2, since NVIDIA hasn’t made the sources available on a regular basis.

Hello all,

It seems that the failing updates are somewhat predictable.

Simple reboots affect the output of nvbootctrl get-current-slot and nvbootctrl dump-slots-info. The current slot and the priorities of both the slots change and follow a cyclic pattern:

Boot 1: nvbootctrl get-current-slot: 0; Priority of slot 0: 14; Priority of slot 1: 14;
Boot 2: nvbootctrl get-current-slot: 0; Priority of slot 0: 14; Priority of slot 1: 15;
Boot 3: nvbootctrl get-current-slot: 1; Priority of slot 0: 13; Priority of slot 1: 15;
Boot 4: nvbootctrl get-current-slot: 1; Priority of slot 0: 15; Priority of slot 1: 14;

Note, that in each and every of these reboots,

  • the machine booted from slot 0 (Both fw_printenv mender_boot_partand findmnt / returned the partition assigned to slot 0)
  • retry_count as reported by nvbootctrl dumps-slots-info was 7 for both slots
  • boot_successful as reported by nvbootctrl dumps-slots-info was 1 for both slots

What seems most interesting to me is the change of the slot priorities. As I understand, the priorities should only change during an update process. We didn’t update, we merely rebooted the device. Also, if I understand the nvidia documentation right, when having a 2-slot layout, priorities should generally only have values of 14 or 15. I don’t understand how a priority of 13 can be reached. Can someone enlighten me on this?

The relevant part of the documentation can be found here.

As discussed, I’ll start a thread on the mender forum on this.

See https://github.com/OE4T/meta-mender-community/issues/7 for issue, https://github.com/BoulderAI/meta-mender-community/pull/1 for description and work in progress changes to fix.