Working around CBoot file system issues

Before I get into the bulk of this post, let me first say that I’m very pleased to see CBoot move towards support for a standard extlinux setup, as well as A/B upgrade/failover support. These are both really big steps in the right direction.

Having said that, what shipped with R32.6.1 is not production quality. While working to set up a proper A/B with encrypted root filesystem, here are some of the issues I have encountered to date:

  • ext2 support is outright broken by default, as blocknum_t is typedef’d as 64bit wide, whereas the on-disk format is 32bit wide. This results in “entertaining” errors for larger files, whereby parts of file contents simply gets skipped, and for sufficiently large files CBoot dies from running over the end of the block array and using garbage data.

  • Even after patching that issue, it turns out that the performance is absolutely woeful - 53 seconds to read the kernel Image from eMMC into RAM. Patching the source further to bypass the “block cache” for file data blocks (as opposed to meta data blocks) brought it down to 48 seconds, still unusuably slow. I don’t know the underlying cause here. Maybe there are more issues with the block cache, or maybe there’s too much overhead initiating eMMC access, or maybe something altogether different. Whatever it is, it gets avoided when using ext4 extents.

  • When using a block size of 1k, which is the default for a small 100MB file system suitable for the boot partitions, CBoot bugs out when reading a file large enough to span multiple extent blocks, such as the kernel Image file. To get past this I have to force 4k block size and hope I never need a kernel large enough to trigger the issue even with 4k blocks.

  • When composing the boot filesystem as part of our build pipeline, we were using the “e2mkdir” and “e2cp” tools to set up the necessary directory structure in the file system image. For an ext2 file system CBoot has no issues with it (other than aforementioned performance). If created as an ext4 file system on the other hand, CBoot fails on it with an “invalid extents magic” error. As it turns out, CBoot’s ext4 support blindly assumes every file is using extents, which is not a safe assumption. Specifically, the e2cp tool uses the regular block allocation method instead of extents, leading to above error when trying to load those files.

  • The alternative to using e2cp in the build pipeline is to prepare the directory structure for the filesystem ahead of time, and let mke2fs populate the image during its creation. This however also led to an unbootable kernel. In this case it’s because the resulting Image file is a sparse file, and again, CBoot lacks support for this and fails to zero the output buffer where the holes in the files are. This error is completely silent, which did not make it easier to track down, in case you were wondering.

So, for anyone else who is trying to get a small boot file system going with this version of CBoot, to save you from having to learn it all the hard way, you will need to:

  • use ext4 format
  • explicitly set the block size in your call to mke2fs
  • use mke2fs to pre-populate the image file
  • patch bootloader/partner/common/lib/fs/ext4/ext4.c to zero fill holes inside the loop of ext4_read_extent()

Hi,
Thanks for sharing the experience and suggestion. This can help other users who need to customize cboot.

If you have seen an issue in booting Xavier NX developer kit and has patch to fix it, please kindly share to us. So that we can reproduce the issue, verify the patch working, and pass to our teams for merging to the default release.

Hi, thanks for the response.

The reason I didn’t include an actual patch in my post is that what I have is a bandaid which I know doesn’t cover certain edge cases such as holes at the end of a file. While it covers my needs, it would not be suitable as part of a proper fix.

As for reproducing the issue, simply generate the file system image with mke2fs -t ext4 -F -d /path/to/your/boot/files/structure bootfs.img 100m and use that. Play around with -t ext2 or -b 2048 or whatever parameters you want to explore. Presumably your engineers already have a way of testing the file system code locally without having to flash a device first. If for some reason they don’t I guess I could share the little test-harness I quickly hacked together while debugging this, but it’s not particularly pretty.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.