Safe device tree updates for released product

Is there a standard way to safely update a device tree of a released product? I can safely upgrade a main application partition by toggling between two and validating if the upgrade was successful. But The device tree is always loaded from the kernel-dtb partition. If it gets corrupted or power loss, all the partitions that use this device tree are dead.

I’ve tried using an FDT line in extlinux.conf to support a different device tree for each application but this fails and is apparently no longer supported as of 28.1? I looked into it and it appears that cboot is making modifications to the device tree before passing it to u-boot. Could we get the source for the cboot device tree modifications so that we can also apply them in u-boot after we swap in our custom device tree?

The only other approach I can see is using the new “b” partitions in 28.2. While I can see the creation of many b side partitions, I can’t find any documentation on how to use these. Are they currently supported and is this considered the “safe” way to update a device tree?

FDT is not used in the newer releases where both TX1 and TX2 use the same rootfs. The only way I know of to update the DTB without issue even after a power loss during update is with the actual flash.sh tool running on an x86_64 Linux PC and communicating over the micro-B USB connector of a recovery mode Jetson.

Unfortunately flash.sh is not an option for our product. There is no micro-B USB and once in customers hands, physical access is not always possible.

We can rewrite the DTB partition which works, but that isn’t safe without a backup. And if the rewrite fails we can no longer boot into the application that does the rewrite.

FDT would work if cboot wasn’t mucking with the device tree. We have reverse engineered the modifications cboot applies but are not sure what some of them are or if they will be consistent with new SOMs going forward.

Unless you can alter eMMC without a running o/s no method will be safe. When in recovery mode the Jetson downloads the fastboot.bin file from the PC host for temporary use in RAM. It is fastboot which understands serial USB protocol and accepts the commands to flash a partition.

This is just a contrived example (there would be a serious amount of effort to get this working), but consider that if you were to place the fastboot.bin used during flash as a partition in the Jetson itself, and if fastboot were modified to automatically stream from something other than serial USB, you might stand a chance.

Imagine you are able to do this, and that fastboot always loads first, and that the bytes it streams in turn chain loads U-Boot should some magic byte sequence not be present…when you go to flash the dtb you would overwrite those magic bytes such that fastboot would default to stream a dtb from some other source…when the dtb flash is complete those magic bytes would be restored once to cause fastboot to chain load U-Boot instead of writing the DTB.

You could be creative as to what stream fastboot would get its dtb bytes from, e.g., the SD card…but beware that anything you choose must have a driver in recovery mode…neither U-Boot nor Linux would be providing drivers at that stage.

I would be tempted to have an entire partition reserved for updates, and have the leading bytes of that partition be the magic number. No external device would be required and only the eMMC driver would be needed (presumably the fastboot used during flash would already have this capability). The down side is of course that this could require significant space if you want to update anything other than something tiny (you couldn’t do this with the rootfs because the rootfs is already nearly the size of the entire eMMC). A 4MB partition (or even 64MB) would probably not be an issue.

I could imagine this partition is the size of the sum of all of the other smaller partitions combined, plus room for magic bytes. Should magic bytes not be valid it would flash based on what is in the partition until it either reaches the end of the partition or sees the magic bytes. If you wanted to force a flash all you would need to do is overwrite those magic bytes (it’s like a journal).

We are altering the eMMC without a running o/s. That is how we are making the rootfs partition upgrade safe.

Imagine three partitions A, B, and Flags. Custom u-boot looks at Flags and decides if it should boot into A or B. If booted into A it can update partition B and set Flags telling u-boot to switch to B and check for successful boot. Only after B boots fully will it update Flags to say it was a success, if that isn’t set then u-boot will default back to the safe working A partition.

I feel like this is a fairly normal A-B partition upgrade method but the catch is that as of 28.1 we are no longer allowed to select which partition holds the DTB. Which leaves the DTB unsafe to upgrade.

I suspect Nvidia realized this flaw because in 28.2 they added B partitions for the following partitions
mb1_b
MB1_BCT_b
spe-fw_b
mb2_b
mts-preboot_b
SMD_b
mts_bootpack_b
cpu-bootloader_b
bootloader-dtb_b
secure-os_b
bpmp-fw_b
bpmp-fw-dtb_b
sce-fw_b
sc7_b
BMP_b
SOS_b
kernel_b
kernel-dtb_b

While these now exist, I can’t find any documentation saying how to boot using the b partitions. This would solve the problem of having only a single kernel-dtb partition that gets used for every boot.

I have sometimes wondered if the “a” and “b” were related to differences between a TX1 and TX2 rather than as any sort of backup system. It would be of interest to know more about what is behind the “a”/“b” scheme. And of course there is a second complication where device trees may be preceded by a signature in R28.2 (if your signature is not valid it might not work…and since I don’t know the rule for signatures I couldn’t say if there is a difference in how this works for “a”/“b” partitions).

The only reference I did find to these B partitions was in the 28.2 release notes. Issue #20037708 references a problem in the MB1 A/B slot redundancy. So I think they are used for redundancy but how is still the question. They also showed up the same time as the TX2i so maybe it uses it as a backup if checksum fails on the first. Out of curiosity I may try flashing my TX2i without images in the b partitions and see if it still runs.

Make sure nothing from a previous flash is there…you might want to dd NULL bytes.