A/B redundancy boot from external SD card using C-Boot

Hi there!
We want to employ full A/B redundancy using a production-use Jetson Xavier NX with external eMMC storage as fallback and internal eMMC storage as the main system. Our use case doesn’t allow physical access to the Jetson after deployment, so the backup system is meant to give us a way to fix the main image in case of data corruption.

Booting from the internal eMMC is no issue, the system can see and mount our storage on the sdmmc3 bus (for now this is a SD card on a carrier board). We can even use the external rootfs as our system but this is not enough, as kernel and devicetree-files would still be vulnerable to damage at this stage.
Next I tried to flash our kernel and dtb files onto the sd card just like I’d do with internal storage:

sudo ./flash.sh -r -k kernel-dtb jetson-xavier-nx-devkit-emmc external

(I’ve previously put the UUID of the SD card partition /dev/mmcblk1p1 inside of bootloader/l4t-rootfs-uuid.txt to be sure the right device gets flashed)

This runs without any troubles! But after a reboot I see this on the debug console:

I> found decompressor handler: lz4-legacy
I> decompressing BMP blob ...
I> Kernel type = Normal
I> Loading kernel-bootctrl from partition
I> Loading partition kernel-bootctrl at 0xa42b0000 from device(0x1)
W> tegrabl_get_kernel_bootctrl: magic number(0x00000000) is invalid
W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
I> ########## SD boot ##########
W> Error: failed to get sd-card params
I> -0 params source = 
E> Failed to initialize device 6-0
E> SD boot failed, err: 252641293

So I guess C-Boot can’t access the SD card by itself? Which makes sense, as far as I understand it the devicetree file which enables the sdmmc3 bus is loaded at a later stage.

Do I need to reflash and reconfigure the bootloader itself?

I’ve attached debug UART output as well as dmesg output from the internal eMMC system. Please let me know if you need further information.

debug-dmesg.txt (63.9 KB)
debug-uart-bootlog.log (41.2 KB)

  1. The flash.sh has no capability to flash anything to external devices. Thus, below command still flashes to the internal boot devices.

sudo ./flash.sh -r -k kernel-dtb jetson-xavier-nx-devkit-emmc external

  1. Did you ever do a full flash after you modify the dtb to enable extra sdcard slot? The bootloader and kernel are using different dtb partition. Thus, if you only update the “kernel-dtb”, the cboot won’t know the existence of extra sdcard slot.

Thanks Wayne,
I remembered the external flag to work on the Jetson AGX, good to know it’s not supported on the NX! To make sure the SD-Card image is correctly flashed I’ve now put it into a development Jetson with SD card slot and flashed it there, than put it back into our carrier board with the production-use Jetson. That’s quite tedious, is there a better way to do this?

I also made sure to make a full flash to both devices. Now the error message is slightly different!

[0004.239] W> tegrabl_get_kernel_bootctrl: magic number(0x00000000) is invalid
[0004.239] W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
[0004.240] I> ########## SD boot ##########
[0004.242] E> no regulator info present for vmmc-supply
[0004.247] W> Error: failed to get sd-card params
[0004.252] I> -0 params source = 
[0004.255] E> Failed to initialize device 6-0
[0004.259] E> SD boot failed, err: 252641308
[0004.263] I> ########## USB boot ##########
[0004.272] W> No valid slot number is found in scratch register

So something about the vmmc-supply pin I guess? I’ll show you our .dtsi-config, maybe that helps:

	sdmmc3: sdhci@3440000 {						//added for sd card reader

		status = "okay";						
		compatible = "nvidia,tegra194-sdhci";
		reg = <0x0 0x3440000 0x0 0x00020000>;
		interrupts = < 0 TEGRA194_IRQ_SDMMC3 0x04>;
		iommus = <&smmu TEGRA_SID_SDMMC3A>;
		dma-coherent;
		max-clk-limit = <208000000>;
		cd-gpios = <&tegra_main_gpio TEGRA194_MAIN_GPIO(Q, 2) 0>;				
		bus-width = <4>;
		cap-mmc-highspeed;
		cap-sd-highspeed;
		sd-uhs-sdr104;
		sd-uhs-sdr50;
		sd-uhs-sdr25;
		sd-uhs-sdr12;
		mmc-ddr-1_8v;
		mmc-hs200-1_8v;
		cd-inverted;
		nvidia,min-tap-delay = <96>;
		nvidia,max-tap-delay = <139>;
		nvidia,vqmmc-always-on;
		pwrdet-support;
		pinctrl-names = "sdmmc_e_33v_enable", "sdmmc_e_33v_disable";
		pinctrl-0 = <&sdmmc3_e_33V_enable>;
		pinctrl-1 = <&sdmmc3_e_33V_disable>;
		ignore-pm-notify;
		resets = <&bpmp_resets TEGRA194_RESET_SDMMC3>;
		reset-names = "sdhci";
		pll_source = "pll_p", "pll_c4_muxed";
		nvidia,set-parent-clk;
		nvidia,parent_clk_list = "pll_p", "pll_p", "pll_p", "pll_p", "pll_p", "pll_c4_muxed", "pll_c4_muxed", "pll_c4_muxed", "pll_c4_muxed", "pll_c4_muxed", "NULL";
		clocks = <&bpmp_clks TEGRA194_CLK_SDMMC3>,
			<&bpmp_clks TEGRA194_CLK_PLLP_OUT0>,
			<&bpmp_clks TEGRA194_CLK_PLLC4_MUXED>,
			<&bpmp_clks TEGRA194_CLK_SDMMC_LEGACY_TM>;
		clock-names = "sdmmc", "pll_p", "pll_c4_muxed", "sdmmc_legacy_tm";
		uhs-mask = <0x08>;
		nvidia,en-periodic-calib;
	 };
[...]
		sdmmc3_e_33V_enable: sdmmc3_e_33V_enable {
			sdmmc3 {
				pins = "sdmmc3-hv";
				nvidia,power-source-voltage = <TEGRA_IO_PAD_VOLTAGE_3300000UV>;
			};
		};

		sdmmc3_e_33V_disable: sdmmc3_e_33V_disable {
			sdmmc3 {
				pins = "sdmmc3-hv";
				nvidia,power-source-voltage = <TEGRA_IO_PAD_VOLTAGE_1800000UV>;
			};
		};

As said before, I can boot with the kernel on internal emmc and rootfs on external SD card via these two extlinux.conf entries, so I don’t think the configuration is broken per se.

LABEL primary
      MENU LABEL primary kernel (root on internal emmc)
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4  console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0

LABEL backup
      MENU LABEL backup kernel (root on sd card)
      LINUX /boot/Image2
      INITRD /boot/initrd2
      APPEND ${cbootargs} quiet root=/dev/mmcblk1p1 rw rootwait rootfstype=ext4  console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0

No, sorry that you don’t get what I want to say.

I remembered the external flag to work on the Jetson AGX, good to know it’s not supported on the NX!

This comment is wrong. Both AGX and NX share the same behavior here. The flash.sh is not able to “flash” anything to your external drive. If you want to validate, you can format the external device and connect it to your NX or AGX. Use the flash command to flash your “external”. After that, your external device will still be empty. Nothing is installed on it.
flash.sh with “external” is just tell the kernel “we have a file system on the external drive with UUID xxxxx, go to mount the file system from there”.

[0004.242] E> no regulator info present for vmmc-supply

Could you add vmmc-supply back to your dts and assign a always-on 3v3 power regulator to it?

You can refer to this thread

flash.sh with “external” is just tell the kernel “we have a file system on the external drive with UUID xxxxx, go to mount the file system from there”.

Ahh, that makes a lot of sense. My bad, thanks for pointing that out!

Having the regulator specified as in you example gives me a slightly different error:

[0004.271] W> tegrabl_get_kernel_bootctrl: magic number(0x00000000) is invalid
[0004.272] W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
[0004.272] I> ########## SD boot ##########
[0004.276] I> Found sdcard
[0004.280] I> enabling 'vdd-sdmmc3-sw' regulator
[0004.286] I> regulator 'vdd-sdmmc3-sw' already enabled
[0004.524] I> sdmmc SDR mode
[0004.538] I> -0 params source = 
[0004.539] E> Blockdev open: exit error
[0004.539] E> SD boot failed, err: 724238353
[0004.539] I> ########## USB boot ##########
[0004.544] W> No valid slot number is found in scratch register

Using a scope probe I made the following observations:

  • Vdd is always supplied
  • The clock is running with 500kHz (3.3Vpp) during c-boot, stops shortly and then starts running again as the Linux kernel boots (but with 1.8Vpp)
  • MISO and MOSI are high for the entire c-boot period, there’s no communication happening at all until the Linux kernel kicks in

Hi,

If Vdd is always-on, what is your vmmc-supply in the device tree? Also, a vmmc-always-on should be added to device tree.

tegra194-p3668-common.dtsi:

// Jan hack
vmmc-supply = <&p3668_vdd_sdmmc3_sw>;
nvidia,vmmc-always-on;

tegra194-fixed-regulator-p3668.dtsi:

p3668_vdd_sdmmc3_sw: regulator@106 {
    compatible = "regulator-fixed";
    reg = <106>;
    regulator-name = "vdd-sdmmc3-sw";
    regulator-min-microvolt = <3300000>;
    regulator-max-microvolt = <3300000>;
    enable-active-high;
};

This is the additional configuration I used compared to my first post. Do you think power management might be the issue here? But then, why would it work perfectly with the Linux kernel but not with CBoot?

Actually, kernel and cboot driver code are totally different. If something can work fine in kernel but not in cboot, I can only say it is also possible.

If you make sure you’ve done full flash and there is no difference between kerenl-dtb and bl-dtb, then you can try to add some debug print in the cboot driver and see why Blockdev open error.

The driver code path is bootloader/partner/common/drivers/sdmmc/tegrabl_sd_bdev.c within function “sd_bdev_open”. One case in this one gets error.

any progress about issue now?
i have the same issue with @janmwolf ,after patched with vmmc-supply , we can find device but open fail.
My board is ok, sdmmc3 works fine after booting in kernel and ubuntu, just fail at bootloader.
BTW we dont have source code about bootloader/partner/common/drivers/sdmmc/tegrabl_sd_bdev.c