Seeed Studio would have to provide the boot chain software since it is their carrier board. I don’t know what the customizations are, but rc.local
was not part of the boot issue. It might help to explain something about an initrd
before saying more.
Whenever boot stages hand off to a kernel, the boot stages themselves have to be able to retrieve whatever software starts the Linux running. This would at minimum include the kernel, and probably device tree and arguments to pass to the kernel. The kernel itself might have more requirements. An example of the kernel having further requirements is that if the system runs on RAID, and if the RAID drivers are in the form of a module, then the modules cannot be on the RAID volume…a bit of the classical “which came first, the chicken or the egg?” proverb. The boot chains understand ext4
. They also understand a RAM disk, which is just a very simple filesystem which exists as a tree structure in RAM. The content which fills the RAM disk is a “cpio archive” (basically a simple serialize/deserialize backup and restore mechanism).
During a normal boot one might load the kernel directly. This works great if everything is on the initial media, and if that media is all ext4
. However, if you get the kernel from the eMMC “/boot
”, and then tell it the rest of the o/s is on external media, e.g., an NVMe, then suddenly the kernel is missing all of its modules if it is missing the mechanism to drive an NVMe.
The initrd will be used initially during initrd boot instead of the filesystem. A cpio archive is unpacked into RAM, and this contains everything the kernel needs for a very minimal boot. It also contains the device tree and kernel modules. For example, if you had an audio module, but it wasn’t needed for boot, then it wouldn’t be in the initrd; if you had a driver for accessing an NVMe which is not part of the main kernel Image
, then that module would be part of the initrd. Instead of a login shell at the end of the initrd, it performs a pivot_root or equivalent which transplants the final rootfs in place of the cpio archive; the cpio archive no longer exists, and the Linux kernel neither knows nor cares because it has another rootfs now.
Your initrd boot is the stage where something is going wrong. The initrd is failing to find these devices:
Finding OTA work dir on external storage devices
Checking whether device /dev/mmcblk?p1 exist
Device /dev/mmcblk?p1 does not exist
Checking whether device /dev/sd?1 exist
Device /dev/sd?1 does not exist
Checking whether device /dev/nvme?n1p1 exist
Looking for OTA work directory on the device(s): /dev/nvme0n1p1
[ 6.907874] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
OTA work directory /mnt/ota_work is not found on /dev/nvme0n1p1
Finding OTA work dir on internal storage device
mount: /mnt: special device /dev/mmcblk0p1 does not exist.
Failed to mount /dev/mmcblk0p1 to /mnt
OTA work directory is not found on internal and external storage devices
There are no devices so far as the initrd is concerned. This is why bash cannot set a terminal process…bash is what runs all of those initrd commands for setting things up within the cpio archive. It’s trying to set up the real rootfs and it can’t see it. It fails to pivot_root because there is nothing to pivot to. That’s the inappropriate IOCTL for device. The system call for a driver that would pivot to a new root is receiving an impossible command to change to missing hardware.
I don’t know if you have a way to analyze your final rootfs. For example, clone it to another computer, or mount it read-only on another computer (a raw clone can be loopback mounted read-only). You would have to figure out if it is the filesystem causing the failure. If not, then you’d have to examine the cpio archive of the initrd and figure out if it does not always run, but does run in this case, and might be missing a required driver (or many) which causes it to otherwise fail to find an otherwise valid filesystem.
One reason why an initrd might fail is if the kernel has modules needed for boot, and the modules are not present. If you’ve ever updated the kernel such that it has other boot requirements in the form of a module, and you failed to put the module in the initrd, then so long as the initrd is not triggered you won’t see the issue; as soon as something causes it to initrd boot, then the missing module would cause failure to find the media. This is an interesting possibility in your case because maybe the normal boot isn’t via initrd until the third failure. Or maybe it always runs the initrd, but under the two circumstances different branches of init (which is the bash script of the cpio archive) might mean it boots fine in one branch, but fails in the other.
Summary:
- The filesystems are bad. A clone and examination could say more, but do so always in read-only mode to avoid the host changing or fixing it.
- The detection of filesystems fails. You’d have to closely examine logs and unpack the cpio archive of the initrd (which is basically command use of
gunzip
and cpio
; it is simple).