rootFS A/B auto switch mechanism not working- stuck on kernel panic

Hi,
I’m using Orin Nano devkit hosting an Orin NX SoM with JP5.1.1.
I flashed rooFS A/B according to dev guide with ROOTFS_RETRY_COUNT_MAX=1.
I verified that I can manually switch between active slots.
The problem is that after executing rm -rf /* on slot a rootfs the system hangs on kernel panic after system reset.

I read the following post regarding same problem in another jetson platform.

I verified that my watchdog is not disabled:

cat /proc/device-tree/watchdog@30c0000/status

(in my case its watchdog@2190000)
In addition this is the relevant part from .dts:

watchdog@2190000 {
compatible = “nvidia,tegra-wdt-t234”;
reg = <0x00 0x2190000 0x00 0x10000 0x00 0x2090000 0x00 0x10000 0x00 0x2080000 0x00 0x10000>;
interrupts = <0x00 0x07 0x04 0x00 0x08 0x04>;
nvidia,watchdog-index = <0x00>;
nvidia,timer-index = <0x07>;
nvidia,enable-on-init;
nvidia,extend-watchdog-suspend;
timeout-sec = <0x78>;
nvidia,disable-debug-reset;
status = “okay”;
phandle = <0x462>;
};

Following is the boot.log
boot.log (116.0 KB)

Hi BSP_User,

[15:58:24:650] [   10.975806] Run /init as init process␍␊
[15:58:24:668] [   10.992442] Root device found: PARTUUID=e671dd75-87ca-4529-87c1-6d8bea4bc845␍␊
[15:58:25:107] [   11.428760] mmc1: SDHCI controller on 3400000.sdhci [3400000.sdhci] using ADMA 64-bit␍␊
[15:58:25:143] [   11.470781] random: crng init done␍␊
[15:58:35:167] [   21.487608] ERROR: PARTUUID=e671dd75-87ca-4529-87c1-6d8bea4bc845 mount fail...␍␊

From your boot.log, it seems not kernel panic at the end.
It seems everything has been cleaned in rootfs.

Could you help to use the following command instead of removing everything in rootfs to verify?

$ sudo rm -rf /lib

Thank you for your help.

  1. I apologise, I posted the wrong log file.

  2. I can confirm that the mechanism works. The problem was that I didn’t wait 120 sec for the watchdog (maybe its a good idea to write this “wait time” in the dev guide A/B section. I found it only in other posts)

  3. Can I reduce the 120 period to a shorter time? (maybe 30-40 sec)
    (I’m not asking how to do it , but if it can interfere something else)

Thanks

You could find the timeout for watchdog in the serial console log or dmesg as following.

[15:58:17:320] [    1.628633] tegra_wdt_t18x 2190000.watchdog: Tegra WDT init timeout = 120 sec␍␊
[15:58:17:329] [    1.635799] tegra_wdt_t18x 2190000.watchdog: Registered successfully␍␊

and you could also configure this timeout through timeout-sec in the node watchdog@2190000 as following.

tegra_wdt: watchdog@2190000 {
		compatible = "nvidia,tegra-wdt-t234";
		reg = <0x0 0x02190000 0x0 0x10000>, /* WDT0 */
		      <0x0 0x02090000 0x0 0x10000>, /* TMR0 */
		      <0x0 0x02080000 0x0 0x10000>; /* TKE */
		interrupts = <0 7 0x4 0 8 0x4>; /* TKE shared int */
		nvidia,watchdog-index = <0>;
		nvidia,timer-index = <7>;
		nvidia,enable-on-init;
		nvidia,extend-watchdog-suspend;
		timeout-sec = <120>;
		nvidia,disable-debug-reset;
 		status = "okay";
}

Thank you for your answer. I just wanted to ask if there is any know lower limit for the WD timeout from your side?

I mean that below tresh it will cause some problems therefore I have to set it to min val for valid system execution

As my understanding, you could configure the timeout-sec for your use case and there’s no lower limit.
Sometime, the system would get stuck for few seconds, if you set this timeout too short, it might cause watchdog reboot easily.

thank you for your help