Flash.sh copies binaries from host system instead of rootfs for ramdisk

When executing the flash.sh script (doesn’t matter if from the terminal or via the sdk-manager) a recovery image for OTA (over the air) updates is created. This is done by this script:
${L4T}/tools/ota_tools/version_upgrade/ota_make_recovery_img_dtb.sh

The recovery image needs some but not all binaries from ${L4T}/rootfs, the exact filelist can be found inside of recovery_copy_binlist.txt
One of those files is /usr/bin/w, used to displays information about the users currently on the machine, and their processes. But if we take a closer look…

$ ls -lah ${L4T}/rootfs/usr/bin/w                                                                                                                                                                                           
lrwxrwxrwx 1 root root 19 Apr 26  2018 w -> /etc/alternatives/w

/usr/bin/w isn’t the actual executable but a symbolic link. In fact /etc/alternatives/w is another symlink:

$ ls -lah ${L4T}/rootfs/etc/alternatives/w
lrwxrwxrwx 1 root root  17 Apr 26  2018 w -> /usr/bin/w.procps

So ${L4T}/rootfs/usr/bin/w.procps is the actual executable that should be copied!
Now, how does ota_make_recovery_img_dtb.sh handle this? Starting from line 65:

# Copy all the binary
while read -r path
do
    _src="$(echo "${path}" | cut -d ':' -f 2)"
    _dst="$(echo "${path}" | cut -d ':' -f 3)"
    cp -f "${_src}" "${_initrd_dir}/${_dst}"
    check_error "cp -fv ${_src} ${_initrd_dir}/${_dst}"
done

Big oof. The link gets de-referenced and so the script copies /etc/alternatives/w from our host system (the ubuntu machine running the flash) instead of ${L4T}/rootfs/usr/bin/w from the rootfs system

Frankly, this let’s me question the quality of the rest of the code. To circumvent this issue I’ve added a bit of shell-fu on my end:

# Copy all the binary
while read -r path
do
    _src="$(echo "${path}" | cut -d ':' -f 2)"
    _dst="$(echo "${path}" | cut -d ':' -f 3)"
    # If _src is absolute link follow until you find the actual file
    while [ -L ${_src} ] & [[ $(readlink ${_src}) = /* ]]; do
        _src="${_rootfs_dir}$(readlink ${_src})"
    done
    cp -f "${_src}" "${_initrd_dir}/${_dst}"
    check_error "cp -fv ${_src} ${_initrd_dir}/${_dst}"
done

-L checks if the source file is a symbolic link, than the destination is read via readlink. Note, that when the symlink is relative nothing has to be done, so we check if it’s absolute (= if the destination starts with /). Than take the destination and append it to the rootfs-string
If the destination is in turn another symlink this has to be repeated (while-loop).

Fun-fact: This bug is the only thing preventing someone to execute flash.sh from other distributions than Ubuntu (as other distros don’t use /etc/alternatives so the script fails, complaining it can’t find w).

After adding my fix I was able to successfully flash a Jetson NX from Arch Linux :)

Nice catch! I guess the script needs to preface every symbolic link dereference itself rather than relying on what the cp command does for dereference. Each step of a dereference really requires a preface of “<rootfs>” rather than what the real dereference uses. Alternatively, the script could use chroot during the copy, but that might actually be trickier to make work correctly. Better yet would be use of sed in combination with detection of a symbolic link, but that too would take some thought to do correctly.

As a temporary workaround which is hard to make go wrong, I will suggest editing recovery_copy_binlist.txt (save an unedited copy). Go through every file which is a symbolic link, and add the secondary targets. You’d need to have the original symbolic links remain in place, and this is why I say “and add the secondary targets”. Then, on the copy command (which you quoted above in a code block), modify the “cp -f” to instead be “cp -f --no-dereference”. You could literally just add any intermediate symbolic links to your file list, along with the final end hard link file in the file list, and with the “--no-dereference” it would not be confused by symbolic links (the cp would then create the link verbatim and not dereference to the host PC’s version).

Note: For NVIDIA the original recovery file list could be kept, and the mechanism to check for dereference links could be used just before the actual “cp” command.