Device Tree customization for AGX

The above means the system was shut down improperly, and perhaps there is now missing file system content. ext4 is a journal-based file system, and it can prevent corruption by replaying the last writes which were not sync’d and removing those, but this does not mean the content is saved. This might or might not be related to your issues, it is hard to say (most of the time it won’t be an issue if the journal does the recovery, but it could be). This is just one of those wild cards you can’t be certain about if shutdown is not correctly performed, e.g., if power is cut or if there is some sort of lock-up.

The sample rootfs is purely Ubuntu (18.04 for most releases, the latest JetPack 5 developer preview is for Ubuntu 20.04), and thus licensing is unmodified for distributing this. The end user is the one who runs “apply_binaries.sh” (automatically from JetPack/SDK Manager, or manually if manually installing), and as you say, this installs NVIDIA-specific drivers and software (basically direct hardware access content). This only needs be done once. This won’t be changing anything related to what you are doing.

What will have an effect is how the rootfs image is generated. Mostly that image is a copy of the “Linux_for_Tegra/rootfs/” content, but some files in “rootfs/boot/” will change depending on arguments passed to the flash software.

It is easiest to explain based on command line flash with “flash.sh”, but the process is the same when run through the GUI. An example command line flash might be:
sudo ./flash.sh jetson-xavier mmcblk0p1

The “jetson-xavier” refers to the config file “jetson-xavier.conf”. This in turn mentions other config based on some particular carrier board. Based on this being an AGX, and based on a particular carrier board, the kernel Image file and device tree may be changed in “rootfs/boot/” prior to creating the rootfs image. Depending on boot options the "rootfs/boot/extlinux/extlinux.conf" may also be changed. Once those are in place the partition image is created as “Linux_for_Tegra/bootloader/system.img.raw” (and the sparse version, “bootloader/system.img”). This contains those updated kernel, device tree, and extlinux.conf files.

When the flash decides to copy the content in it will copy a reference version of various files to either the “bootloader/” or “kernel/” directories, and then copy that file into “rootfs/”. To know which one is copied it is easiest to just log a command line flash and read the logs. An example is:
sudo ./flash.sh jetson-xavier mmcblk0p1 2>&1 | tee log_flash.txt

You already know about the custom .cfg file since you are using that, but be aware that you can customize the sub-components which are copied as well (e.g., you can make your own reference copy of content with a new name…I have not done so myself, but that is the purpose of separation of carrier board config and module config into human-readable config files). I do not know if perhaps flash put some default file in which stops boot from completing or not, but you could flash on command line using your config file and look at the logs to see if the wrong content was copied or not. If not, then you know to edit your content.

There might also be some question of whether the initrd works with your setup. I could not say, but perhaps the logs will provide a hint as to how the initrd was created (perhaps it failed to use your device tree, though likely that isn’t a problem; it does make a good example of what can go wrong).

Note that if the initrd was running, then the Linux kernel was running in a limited RAM-disk system, and that this must have at least partially succeeded because the ext4 file system was repaired. Perhaps the repair is why it doesn’t work? Don’t know. However, the initrd does successfully complete its job and then pivot_root to the mmcblk0p1 partition. This is when it goes wrong. This could be because the initrd did not set up properly before pivot_root, or it could be because the ext4 file system on mmcblk0p1 is not valid based on what was passed.

Incidentally, the kernel is process ID (PID) 0. The kernel technically is running only a single program, that program being “init”. Systemd is the part of init which brings up the systems in a “somewhat” object-oriented way (I consider systemd most of init, although in older systems this was just a bunch of bash shell script files). If you can’t start systemd (init), then the system panics and cannot continue. Looks like systemd partially started since it detected arm64, but nothing else continues…basically init is dying almost instantly upon pivit_root to the eMMC.

I would not typically expect a simple journal-based file system repair to cause init to fail almost instantly. This failure is in fact likely the reason the system was not shut down cleanly…there was probably no chance of that. So I suspect something was incorrect about the rootfs or the initrd, but since the partition was found, and since the pivot_root is eMMC, I suspect that either it is the rootfs or the device tree at issue (init can’t work very well if it goes to use hardware and the device tree causes the hardware to be missing).

1 Like