TX2 NX Kernel does not change after replacing /boot/Image with AB rootfs enabled R32.7.1

I have a custom kernel image that I would like to be able to update a TX2 NX module remotely with on R32.7.1 . Based on the documentation and several threads on this forum it seems this should be possible by simply replacing the file /boot/Image with the desired custom built image file. However when I do replace the file in the boot directory, the system still loads the previous kernel from partition. I have specified using /boot/Image in /boot/extlinux/extlinux.conf but it seems to ignore this entirely and boot from the partition no matter what.

To be clear, if I flash the device from a host using the proper flash.sh command the kernel is properly updated. It seems to exclusively load from partition without checking the rootfs for a kernel image at all. Is this intended? If not, how can I get the system to boot from the desired Image file?

hello jacob46 ,

could you please setup serial console and gather the complete logs (bootloader and kernel) for reference,
thanks

here is the full uart logs and the extlinux.conf file used, this is from a fresh flash with no changes to either A/B slot.

extlinux.conf (889 Bytes)
log1 (70.3 KB)

hello jacob46,

it’s uboot for retrieving extlinux.conf file and uses the kernel image via file system.
so, did you set environment variable USE_UBOOT as zero to disable uboot?

I do not have the USE_UBOOT environment var set when flashing the device. I have decided I will stick with not using U-boot and instead just push updates by writing to the proper kernel partition as this is actually more convenient for my use case. However, upon testing if the device will switch the active kernel partition upon a failed boot, it seems to never switch which partition is being booted from. I simulate a “bad” kernel update by setting slot B as unbootable, dd’ing the kernel partition with 0s, then setting B as the active boot slot. After rebooting the device boots 7 times from the B slot bootloader and loading the kernel/dtb from kernel_b, after this it switches to booting from the A slot bootloader but remains using the B slot kernel/dtb. It tries 7 times before reaching an unbootable state (I presume both slots are now marked as unbootable) and fails to boot into anything upon further resets.

I have attached the UART logs, does the redundancy behavior only work when running nv-update-engine? Or is there something else I am missing for triggering fallback to a successfully booted slot?
log2 (98.8 KB)

hello jacob46,

could you please check whether you’ve include this uboot fixes?
it revise offsets for TX2-NX to allows a disk-based DTB (via FDT in extlinux.conf) to be used with the current kernel image.

To be clear, uboot is intentionally disabled and I never see uboot start when reviewing the UART logs. Is there a analogous config file that should be changed in CBoot if I am using that for loading the kernel from partition?

hello jacob46,

please include this cboot fixes, please also re-flash the cboot partition for confirmation.

The fixes provided are for a different version of CBoot as well as for a different jetson board. The code you provided was not sufficient to change the issue I was having as it did not work properly with the existing cboot code for 32.7.1. I attempted my own fix by modifying Cboot and I believe I have resolved the issue. For those finding this thread with similar problems, Cboot requires modification so that it properly boots to the recovery kernel after failing to boot to both redundant slots.

Unmodified, if Cboot detects an invalid kernel before ever trying to load the kernel it will reboot the system without incrementing the retry counter. This leads to the system rebooting without ever switching to the redundant kernel slot (eg. if its trying to boot using kernel_b and finds its invalid it will continue booting to kernel_b without ever trying to return to kernel (slot a)). Eventually it gives up on booting and hangs without ever starting the bootloader.

Is this considered a bug? It is much more desirable to me to have the kernel be loaded from partition so I would rather skip using uboot and extlinux entirely and only use cboot.

hello jacob46,

we’ve include the fixes to the new release, is it possible for moving to r32.7.2 and check the status?
BTW, you may also share the code snippets address the issue, it may help users that based-on r32.7.1.

I have included the minor changes I made below. The issue lies in how lower level modules will reset the processor without updating information about A/B boot success. These are surface level changes and likely do not cover all cases, but at least its a start.

bootloader/partner/common/lib/a_b_boot/tegrabl_a_b_boot_control.c :

tegrabl_error_t tegrabl_a_b_get_current_rootfs_id(void *smd, uint8_t *rootfs_id)
{
	struct slot_meta_data_v2 *smd_v2;
	uint8_t rootfs_select;
	uint16_t version;
	uint32_t bl_slot;
	tegrabl_error_t error = TEGRABL_NO_ERROR;
	if (rootfs_id == NULL) {
		return TEGRABL_ERROR(TEGRABL_ERR_INVALID, 0);
	}
	if (smd == NULL) {
		error = tegrabl_a_b_get_smd((void **)&smd);
		if (error != TEGRABL_NO_ERROR) {
			TEGRABL_SET_HIGHEST_MODULE(error);
			return error;
		}
	}
	version = tegrabl_a_b_get_version(smd);
	if (BOOTCTRL_SUPPORT_ROOTFS_AB(version) == 0U) {
		return TEGRABL_ERROR(TEGRABL_ERR_INVALID, 0);
	}
	/*
	 * "Unified bl&rf a/b" is supported from version
	 * BOOT_CHAIN_VERSION_UNIFY_RF_BL_AB. If supported and enabled,
	 * use bootloader active slot for rootfs.
	 */
	if (BOOTCTRL_IS_UNIFIED_AB_ENABLED(version)) {
		error = tegrabl_a_b_get_active_slot(smd, &bl_slot);
		if (error != TEGRABL_NO_ERROR) {
-			return error;
+			//no valid rootfs slot found
+			return TEGRABL_ERR_NOT_FOUND;
		}

		*rootfs_id = (uint8_t)bl_slot;
		return error;
	}
	smd_v2 = (struct slot_meta_data_v2 *)smd;
	rootfs_select = ROOTFS_SELECT(smd_v2);
	*rootfs_id = GET_ROOTFS_ACTIVE(rootfs_select);
	return TEGRABL_NO_ERROR;
}

bootloader/partner/common/lib/linuxboot/fixed_boot.c :


	/* Load normal kernel and kernel-dtb */
	err = tegrabl_load_from_partition(kernel, boot_img_load_addr,
					  dtb_load_addr, kernel_dtbo,
					  data, data_size,
					  false);
	if (err != TEGRABL_NO_ERROR) {
		pr_error("Storage boot failed, err: %u\n", err);
#if defined(CONFIG_ENABLE_A_B_SLOT)
		pr_error("A/B loader failure\n");
		pr_trace("Trigger SoC reset\n");
-		tegrabl_reset();
+		//tegrabl_reset();
#endif
	}

fail:
#if defined(CONFIG_ENABLE_EXTLINUX_BOOT)
	if (fm_handle != NULL) {
		tegrabl_fm_close(fm_handle);
	}
#endif
	return err;
}

bootloader/partner/common/lib/linuxboot/linux_load.c :

fail:
#if defined(CONFIG_ENABLE_SECURE_BOOT)
	pr_debug("%s: completing auth ...\n", __func__);
-	err = tegrabl_auth_complete();
+	tegrabl_auth_complete();
#endif
	tegrabl_free(kernel_dtbo);

bootloader/partner/t18x/common/lib/partitionloader/tegrabl_partition_loader.c :

tegrabl_error_t tegrabl_load_binary(
		tegrabl_binary_type_t bin_type, void **load_address,
		uint32_t *binary_length)
{
#if defined(CONFIG_ENABLE_A_B_SLOT)
	tegrabl_error_t err;
	tegrabl_binary_copy_t bin_copy = TEGRABL_BINARY_COPY_PRIMARY;
	/* Do A/B selection and set bin_copy accordingly */
	err = a_b_get_bin_copy(bin_type, &bin_copy);
	if (err != TEGRABL_NO_ERROR) {
		pr_error("A/B select failed\n");
		goto done;
	}
	err = tegrabl_load_binary_copy(bin_type, load_address, binary_length,
			bin_copy);
	if (err == TEGRABL_NO_ERROR) {
		goto done;
	}
	/*
	 * TODO: Add error handling such as fallover to other good slot or
	 * enter fastboot if no good slot is found
	 */
	pr_error("A/B loader failure\n");
	TEGRABL_ERROR_PRINT(err);
	pr_debug("Trigger soc reset\n");
-	tegrabl_reset();
+	//tegrabl_reset();
#else	/* !defined(CONFIG_ENABLE_A_B_SLOT) */


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.