I am not sure if you are working with 4.x or 5.0.2 here. I have only experience with the 5.0.2 currently, so disregard this if you’re on an earlier version.
What is failing there would usually look like this:
Setting "FDT /boot/dtb/kernel_tegra194-....dtb" successfully in the extlinux.conf...done.
populating rootfs from /work/work/Linux_for_Tegra/rootfs ... populating /boot/extlinux/extlinux.conf ... done.
So I assume your rootfs folder does not contain any file under /boot/extlinux/extlinux.conf.
The flash script will modify this file, and the device tree file right before flashing, so that you always got the right configuration on your system, whatever is currently set in the rootfs.
The file should usually be there if you ran your apply_binaries.sh successfully.
Thanks, it looks like apply_binaries is a necessary step that I was not aware of. And that printed some output with a lot of pipes in it, and i also saw another topic that hints that I did not unpack the rootfs properly (with sudo), so I’m trying that now, and also it probably helps to not include that downloaded rootfs tarball inside of the Linux_for_Tegra/rootfs/ dir!
sudo apply_binaries.sh was successful, and I have indeed gotten past this step, it looked like this:
Setting "FDT /boot/dtb/kernel_tegra194-p3668-0000-p3509-0000.dtb" successfully in the extlinux.conf...done.
populating rootfs from /home/slu/iai_data/2022-12-15/Linux_for_Tegra/rootfs ...
Since steps 1 and 2 of Workflow 10 Example 2 have now completed for me, perhaps this is now in a state where if I find out what commands to use, I could maybe update the extlinux.conf to point to NVMe and then make it re-generate the Images and then I may have a good starting point for booting off NVMe. Do you agree?
Tried to do the flash (step 3):
❯ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only
/home/slu/iai_data/2022-12-15/Linux_for_Tegra/tools/kernel_flash/l4t_initrd_flash_internal.sh --usb-instance 5-4.4 --device-instance 0 --flash-only --external-device nvme0n1p1 -c "./tools/kernel_flash/flash_l4t_nvme.xml" -S 8GiB jetson-xavier-nx-devkit external
* Step 1: Build the flashing environment *
/home/slu/iai_data/2022-12-15/Linux_for_Tegra/tools/kernel_flash/l4t_initrd_flash_internal.sh: line 735: /home/slu/iai_data/2022-12-15/Linux_for_Tegra/tools/kernel_flash/initrdflashimgmap.txt: No such file or directory
Google has zero results for initrdflashimgmap.txt.
The issue here will be that, as you saw, the flash script will modify your extlinux.conf before flashing.
So right after flashing, it will not be set to what you have set in your rootfs.
As far as I know you won’t be able to trick the script into writing /dev/nvme0n1p1 in there while flashing to emmc.
The only really hacky idea I could come up with is creating one “fake” rootfs for your emmc. And one “real” for your NVMe. In the emmc, you could place a startup script that will modify your extlinux.conf upon the first boot, so that the next boot will be from NVMe.
Anyway I’d not really suggest that. It might break with whatever nvidia will do in their next update…
The right way is to flash the UEFI with the variable set correctly. As i’ve understood that will be supported with the next release, but who knows when that will come. Also I do not know if it will be supported by a variable or if we still will have to compile the UEFI binary ourselves…
Since NVIDIA does not support flashing and booting NVMe currently my company will probably have to do the Serial setup step after flashing, until the flash scripts are capable of flashing correctly right out of the box.
I have compiled the UEFI myself, it is possible but a bit cumbersome…It does not support A/B rootfs booting anymore though if you do that…
place a startup script that will modify your extlinux.conf upon the first boot, so that the next boot will be from NVMe.
You know it’s funny that you suggest this. For our previous 4.4.1 based massflash, I made a startup script which streamlines the production workflow: When you massflash, the device boots up after flash completes. This is not ideal because it does not help for identifying any possible failures (and they happen when hundreds are done in batches!).
I made a one-time initial autoshutdown service. I carefully configured the systemd service to run, and what it does is uninstall itself and then shut down. This way the devkit boards used to do the flashing will end up in a powered down state with LED off if the flash was successful, and the LED will be on if the flash failed on that device for any reason.
That’s something I’ll revisit and set up later, or perhaps these large changes brought about by Jetpack 5 will make it obsolete or impossible (since I intend not to even load an OS on eMMC, and hope not to involve any NVMes during our SoM flashing step). Hopefully flashing only QSPI can be much faster and less error-prone. All I can say is this item is much lower on the checklist right now.
I suppose your setup is a bit different from mine. I am ok for now with leaving the eMMC dormant without any viable OS to boot since that keeps things simple and our application cannot run with only 16GB of eMMC capacity. So I am hoping if we can just only flash QSPI for the eMMC that it might help sidestep the frustrating problem you have described very well so far.
@KevinFFF I hope an NVIDIA rep can comment later today on the issue I’m blocked on now, which is
l4t_initrd_flash_internal.sh: line 735: <snip>Linux_for_Tegra/tools/kernel_flash/initrdflashimgmap.txt: No such file or directory
Meanwhile I’m going to see if Workflow 11 could at least give me viable bootable NVMe devices, and I could at least then make some progress testing that against the SoM flashed via e.g. SDK Manager.
Which is what causes *** no-flash flag enabled. Exiting now... ***
Since tools/kernel_flash/README_initrd_flash.txt is saying Workflow 11 preps the disk directly attached to the host system, and either because of some sticky config or a bug it’s not flashing to the board. Maybe I will try to dd the ./bootloader/system.img.raw to the NVMe and see if it will be a working bootable target. I hope so…
proceed with normal flash steps for 5.0.2 (rev.1) and choose NVMe as flash target
SoM is in indeterminate state (comes up in DFU mode usb ID 7e19 even when not using jumper)
This actually successfully sets up the NVMe, as we later find out
remove NVMe from flash board
proceed with normal flash steps for 5.0.2 (rev.1) and choose eMMC as flash target
boot off eMMC works
plug in NVMe
automatically boots off NVMe
I will be back to explore the flashing scripts, since I will obviously need to automate using them to roll out our next thousand or so units. I highly suspect that there is a significant difference between the 5.0.2 BSP package available for download and the 5.0.2 (rev.1) BSP as used by SDK Manager, but haven’t gone to lengths to confirm that yet.
I understand this, but the readme document is too vague. It does explain that I am supposed to provide the name of the host-attached device:
<extdev_on_host> is the external device /dev node name as it appears on the host. For examples,
if you plug in a USB on your PC, and it appears as /dev/sdb, then <exdev_on_host> will be sdb
This implies that it’s going to do something to touch <exdev_on_host>, which it clearly does not, because it failed with the output I have already posted above. It’s possible that I’m reading the tea leaves incorrectly here, but notice the rest of the output from this point make no mention of even attempting to write to sdn:
*** no-flash flag enabled. Exiting now... ***
Save initrd flashing command parameters to /home/slu/iai_data/2022-12-15/Linux_for_Tegra/tools/kernel_flash/initrdflashparam.txt
writing boot image config in bootimg.cfg
extracting kernel in zImage
extracting ramdisk in initrd.img
/tmp/tmp.6eLqOFV2dr/initrd /tmp/tmp.6eLqOFV2dr /home/slu/iai_data/2022-12-15/Linux_for_Tegra
Perhaps whatever command to actually write to the exdev_on_host specified never made it into the script. Such a command would contain the critical information of which image which may or may not have been prepared should be used and what the process is to be used to write it to the host-attached device.
I hope you can agree with me that the only reasonable interpretation of this output i’ve shared (and let me know if more of the output from higher up in the log may be of relevance) is that the script terminated prematurely due to the --no-flash flag being provided to flash.sh. It would not be reasonable for me to start going in and hacking at these scripts. It seems like there needs to be a better release process for these packages as well as more complete documentation. These readmes are appreciated but they are not in-depth enough.
There are more steps that I have planned to test these out. At the time I was testing with a USB 3.0 external adapter for NVMe but I suspect it may not work compared to actually having the NVMe device properly attached to the computer as a /dev/nvmeXn1 device.
As confirmed by @seeky15 there is no way to use the 5.0.2 BSP package download flashing scripts to prepare a Xavier NX eMMC SoM flashed into a state where it can auto boot to NVMe (to be clear, no serial port shenanigans, no manual step to configure UEFI interactively, no rebuilding UEFI and installing it somehow, or other workarounds not suitable for factory production). However I have confirmed that flashing 5.0.2 (rev.1) with SDK Manager 1.9.0_10816_amd64.deb from a Ubuntu 20.04.5 LTS amd64 linux machine using default settings puts it into such a state. Therefore, all we need is a workable massflash workflow that delivers this flashed state onto our SoMs.
Also, the above indicates that there may be some sort of change between what SDK Manager refers to as 5.0.2 (rev.1) and the 5.0.2 BSP download. Can you check internally to see if this is the case and where we can get the updated 5.0.2 (rev.1) BSP package download?
On the NVMe side, the only way I’ve been able to prepare an NVMe drive in such a way that it can be booted to from a 5.0.2 Xavier NX eMMC SoM is by attempting flashing 5.0.2 (rev.1) with SDK Manager 1.9.0_10816_amd64.deb from a Ubuntu 20.04.5 LTS amd64 linux machine using default settings with the one exception of also having the NVMe SSD installed in the flashing board and choosing to target NVMe during the flash step. This leads to a failed flash result, but the SSD is left in a good bootable state. I am currently exploring ways to clone this NVMe device to develop a way for us to image these disks for internal development and proceeding on to production. If there is some massflash-related way to prep these NVMe disks as an alternative to cloning from a known-good “flashed” NVMe, that would also be appreciated, but so far it looks like imaging and cloning the disk will be the way to go.
May I ask why this is not suitable for factory production?
You could pre-build an modified UEFI image with default boot order set to NVMe. (Initrd flash boot order - #26 by WayneWWW)
Just replace UEFI image in Linux_for_Tegra/bootloader/uefi_jetson.bin.
They should be the same. You could use the downloaded Jetpack SDK from SDKM. The path is as following indicated in SDKM.
xavier nx config files do not have any OVERLAY_DTB_FILE env var. They only have a DTB_FILE=tegra194-p3668-0001-p3509-0000.dtb env var.
The third sentence
Reflash the board
brings us to the present issue where flashing from the BSP package I’ve prepped hasn’t been able to succeed (see discussion above). I would want to get this working in some way before attempting the UEFI rebuild and dtb changes for it.
OK so it looks like after some searching I’ve got a rough idea of how to go down this UEFI path. So I can accept this now as likely viable for production rollout, I take that back then.
But what makes no sense at all, and still calls everything into question, is why my “failed” SDKM derived 5.0.2 NVMe disk, combined with my vanilla 5.0.2 (rev.1) SDKM flashed SoM, magically seamlessly boots to NVMe all by itself without manual intervention. Ever since I’ve confirmed this behavior it’s cast so much doubt on everything else. Why should I bother to rebuild UEFI just to get the updated version if 5.0.2(rev.1) via SDKM 1.9.0 provides the required functionality all by itself?
I think my next step will be to use the SDK/BSP directory that was actually prepared by SDKM to do Workflow 7 experiments to try to get a quick path toward massflash. I am willing to accept that although there exists a way to flash the SoM with a way to auto boot NVMe, that I might still never be able find out why it works, given the existence of discussions such as the aforementioned topic, Initrd flash boot order, in which it is clearly stated that 5.0.2 does not come with a way to auto boot to NVMe without deep modifications like the UEFI one above.
I tried to do this already, but I will try that again after doing some more critical experiments that should shed more light on the real issues. In particular I need to test more of the initrd flash script examples from the readme from the SDK Manager provided SDK directory. It is possible that my earlier tests were conducted with a somehow improperly-set-up BSP package. I found it unusual that nothing I attempted with it could succeed.
Please allow me a few days for testing more thoroughly so that this discussion can be more productive. Thanks.