I’m trying to create a custom initramfs built into the kernel image on Jetpack 3.2 for the TX1. I want to do some recovery tasks during boot in case the rootfs gets corrupted. I’ve created a barebones system using buildroot and busybox, and I have a custom kernel build. I’m building the kernel with:
When I build the kernel, it’s size is now 3 MB larger than before, so that tells me that the initramfs is included in the kernel image. When I boot the kernel, it runs up to the point where I think it should boot into the initramfs, but it just hangs. My init script has some echo statements that should tell me that it’s running, but I never see them.
I’m thinking that it’s either not running the initramfs correctly, or else it’s running but the console output is not going out through the serial port. Is there something I’m missing?
I’m wondering if I need to modify the kernel arguments passed in by cboot? When I flash this directly using flash.sh, then the kernel goes directly to the LNX partition, and there’s no U-Boot which is fine for me as long as the kernel boots.
I am only guessing, but I suspect U-Boot needs to understand this format of kernel. If U-Boot does not understand how to access and place the initrd in RAM your init will be empty. It isn’t Linux which reads the initrd and places it into RAM…it is U-Boot, and then Linux simply accesses this without any need to understand initrd unpacking. As is U-Boot is designed to use the separate file location pointed at via extlinux.conf.
Consider that an initrd is typically used to load a kernel module which is necessary to get initial setup loaded from a file system type not integrated into the kernel…e.g., perhaps your kernel has ext4 integrated, but the file system is xfs and needs an xfs module loaded to access it. You get the proverbial “which came first, the chicken or the egg” question. Since U-Boot understands the initrd and unpacks it and places it at the right location the kernel does not need to access an xfs file system to load the xfs module…it is in RAM already. The physical RAM address for a kernel would typically start at 0x80000000, and the initrd and modules would be just below this address (within range of a direct branch instruction). It wasn’t the kernel which placed it there (and if it were an xfs module the kernel would not be able to do this since the module itself is on xfs…thus initrd). You’re missing a U-Boot config, not a kernel config (I couldn’t tell you which config though…the kernel should be unaware of your build options for initrd).
It’s my understanding that there is a real difference between an initrd and the initramfs. The kernel contains a gzipped cpio archive and on bootup this archive is loaded in tmpfs. If it contains a root filesystem with an init executable, then it gets run. If not, then the boot process proceeds as normal to boot from the real rootfs. It’s possible to populate this initramfs with a basic system to perform preboot tasks, and it always gets loaded, even without the help of U-Boot.
(see details here: https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt )
Also, in Jetpack 3.2, if you run the installer GUI to flash your SOM, you get U-Boot installed in the LNX partition, which then looks for the kernel image (and an initrd) in the /boot directory of the rootfs. However, if you flash the SOM using the flash.sh script from the command line, then the LNX partition gets populated with the actual kernel binary image, and U-Boot does not exist. It appears to me that the nVidia “cboot” knows how to recognize the kernel image, and just boots it from the LNX partition.
Since it’s possible to build a single binary image that contains the kernel and initramfs, this would all be stored in the LNX partition, and if my rootfs partition gets clobbered somehow, I would be able to detect this with a script in initramfs and perform some sort of recovery action. This is what I am trying to do. If I can just get the kernel to run my init process in the initramfs filesystem I built. :)
I’ll have to look closer at the LNX partition. Right now I have the TX2 connected, and will do some experimentation with the TX1 instead some time in the next few days. I was not aware of the kernel going straight to the LNX partition…I assume this is if you use flash.sh to flash the kernel? Normally this is just a file copy, and I would have to wonder if the LNX partition flashing with the kernel is actually a bug which happens to work (it would be one of the more interesting and useful bugs because it would make a nice feature).
I had a system that was originally installed with the flash.sh script from the command line. Then I tried swapping out the kernel in /boot with my own custom kernel… but after rebooting it was still running the old stock kernel. After wrestling with it for a while, I noticed that during bootup it was never displaying the U-Boot menu. I inspected the contents of the LNX partition, and it looked like a kernel. I used DD to write my kernel image to the LNX partition, and after rebooting it was finally running my custom kernel, without U-Boot.
If this is just a bug that just happens to work, then it would be a nice coincidence… but I suspect that cboot is just loading code from the LNX partition into memory, and then executing it. That would explain why it works with U-Boot or a kernel image.
Technically the Linux kernel is just a boot loader which doesn’t load anything except itself. They’re both bare metal, so as long as hardware is set up sufficiently at some given stage to where the bare metal code can run they are interchangeable. I would be very interested to know what NVIDIA’s view is of the purpose of the LNX partition and what was actually intended to be there.
Here’s a quote from the L4T development guide regarding boot flow:
“When all necessary boot files are verified and loaded, the tegraboot transfers control (jumps) to the boot loader such as cboot. The boot loader validates and loads next level software such as Linux kernel or U-Boot.”
This seems to imply that cboot can load the Linux kernel directly. Anyone from nVidia want to chime in and confirm this?
Also, my boot log shows timestamps from cboot… and then transitions directly to kernel timestamps without any U-Boot messages.
[0001.621] osc freq = 38400 khz
[0001.627] welcome to cboot
[0001.630] Cboot Version: 00.00.2014.50-t210-e831cf53
[0005.533] nvdumper Carveout: Base = 0xff23f000 and Size = 0x80000
[0005.540] bct_init bctinit
[0005.542] bct_init bctinit
[0005.620] Starting Bpmp FW
[0005.622] BPMP-FW Carveout: Base = 0xff2c0000 and Size = 0x40000
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 4.4.38-TEAL-L4T-28.2-015 (dennis@dennis-XPS-15) (gcc version 5.3.1 20160113 (Linaro GCC 5.3-2016.02) ) #1 SMP PREEMPT Tue May 1 16:52:19 MDT 2018
Later on I get this:
[ 5.227261] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[ 5.240169] Freeing unused kernel memory: 3924K (ffffffc00122b000 - ffffffc001600000)
[ 5.249000] Freeing alternatives memory: 80K (ffffffc001600000 - ffffffc001614000)
[ 5.257765] btb inv war enabled
but the kernel is not hung… I’m seeing more messages:
[ 24.939215] random: nonblocking pool is initialized
[ 65.015256] xhci-tegra 70090000.xusb: cannot find firmware…retry after 1 second
[ 66.023256] xhci-tegra 70090000.xusb: Direct firmware load for tegra21x_xusb_firmware failed with error -2
[ 66.034254] xhci-tegra 70090000.xusb: Falling back to user helper
[ 126.039241] xhci-tegra 70090000.xusb: cannot find firmware…retry after 1 second
[ 127.047243] xhci-tegra 70090000.xusb: Direct firmware load for tegra21x_xusb_firmware failed with error -2
[ 127.058231] xhci-tegra 70090000.xusb: Falling back to user helper
[ 187.063238] xhci-tegra 70090000.xusb: cannot find firmware…retry after 1 second
[ 188.071243] xhci-tegra 70090000.xusb: Direct firmware load for tegra21x_xusb_firmware failed with error -2
[ 188.082368] xhci-tegra 70090000.xusb: Falling back to user helper
[ 248.087240] xhci-tegra 70090000.xusb: cannot find firmware…retry after 1 second
[ 249.095242] xhci-tegra 70090000.xusb: Direct firmware load for tegra21x_xusb_firmware failed with error -2
[ 249.106525] xhci-tegra 70090000.xusb: Falling back to user helper
[ 309.111241] xhci-tegra 70090000.xusb: cannot find firmware…retry after 1 second
[ 310.119217] xhci-tegra 70090000.xusb: Leaving it upto user to load firmware!
[ 310.127203] xhci-tegra 70090000.xusb: Direct firmware load for tegra21x_xusb_firmware failed with error -2
[ 310.138785] xhci-tegra 70090000.xusb: Falling back to user helper
[ 601.059225] tegradc tegradc.1: blank - normal
But never any messages from my initramfs. The xusb errors are due to missing usb firmware, but I can load that in my initramfs if it were working.
FYI, I think of the bootloader as an abstraction to the hardware at boot. Kernels typically lack some of the setup which might be needed prior to reaching the kernel, e.g., clocks, I/O setup for the disk, so on. If what the kernel needs is already there then there isn’t much need for the bootloader. On the other hand, few people would want to run without a bootloader because of options and flexibility no longer being possible (take a look at “/proc/cmdline” if you get this to work…you’ll be missing many arguments…some may be for serial console once the kernel boots). Can your kernel boot with no arguments passed?
I don’t have much faith in expensive JTAG debuggers these days, but if you could put this in a debugger and boot you’d see exactly what line in the kernel source it is stopping at.
So it turns out that my initramfs really is running, I just couldn’t see any console output, and my shell isn’t talking on the correct console.
I added the kmsg device to /dev
‘sudo mknod kmsg c 1 11’
and then modified my echo statements like this:
/bin/echo “Entered INITRAMFS” > /dev/kmsg
Now I can see text messages in the form of a kernel message, and I’ve confirmed that my initramfs is running. I just need to figure out how to get the text redirected out the console correctly, and I should be set.
Normally these kernel command line arguments set up the first console and serial console:
If you can find a way to pass this as a command line argument it might work (the ttyS0 is for serial console, the tty0 is for physically connected keyboard/monitor text mode).
@dmillard I was looking for a a rootfs recovery methodology as you hinted in your first post.
I was wondering if you could share code snippet or web references that you used.
Any hint would be greatly appreciated.