Wrong A/B boot

Hello, JerryChang.

Sorry for late reply. This topic was automatically closed, so I created a new one.

We managed to find a bug inside our driver. It was spinlock recursion. Due to this bug system hanged and rebooted.
However system rebooted to the OTHER slot.
It happens on JP 4.5 L4T 32.5.1 and JP 4.6 L4T 32.6.1.

Below are steps to reproduce wrong A/B boot:
smd.cfg (2.6 KB)
spinlock_recursion_test.c (416 Bytes)

  • flash Xavier with A/B support enabled, using smd.cfg attached. The tegra_wdt:watchdog@30c0000 must be enabled also.
  • build and load spinlock_recursion_test.ko
$ nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: A
magic:0x43424e00,             version: 5             features: 0             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 6,             boot_successful: 0
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1
$ insmod spinlock_recursion_test.ko
  • System will hung. It will be rebooted after 120s by means of TEGRA_BCCPLEX_WATCHDOG. After rebooting check that boot slot was changed and retry_count of previous slot is zero.
$ nvbootctrl dump-slots-info
Current bootloader slot: B
Active bootloader slot: B
magic:0x43424e00,             version: 5             features: 0             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 0,             boot_successful: 0
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

hello nazaraa,

I’m a little confused, may I know what’s the actual test case here.
you’ve mark slot-0 as boot_successful: 0, which meant that the A/B redundancy is disabled.
besides, could you please setup serial console to gather the complete booting messages from crash to boot-up for reference,
thanks

Hello JerryChang,

We expect that after the watchdog is triggered, the system will try to boot from the current slot. We expect that in this case retry_count should be 5. And only after 5 reboots, the system should switch to another slot.
According to this diagram, if boot_successful = 0, then retry_count must be decremented. If both boot_successful = 0 and retry_count = 0, then the slot is marked as invalid and boot starts from another valid slot.
It works well if system is rebooted by sudo reboot command. However it does not work as expected when watchdog is triggered.

BTW, there is an easier way to reproduce the issue. Reboot by watchdog can be triggered by sudo touch /dev/watchdog command.
Please find requested log attached.
log.txt (40.8 KB)

Hello JerryChang,

Have you confirmed the bug? Do you have any patches to fix it?

hello nazaraa,

FYI,
we had reproduce the issue and we’ve arrange resources to have internal investigation.
will share the results when we come out solutions.
thanks