Custom kernel fails to boot

Hi.

I tried to rebuild the kernel to add KVM and NFS v4 support. I followed the official guides and compiled the kernel on a UbuntuVM with the recommended linaro toolchain.

I then rebuilt the currently used kernel packages by pulling them and modifying them with the following script:

Linux_for_Tegra/my_debs/build_kernel_debs.sh
#!/bin/sh

export injection_filelist="my_debs/.tmp_kernel_inject.txt"

# Build injection file
rm -f $injection_filelist
find rootfs/lib/modules/4.9.201-tegra/ -type f -exec bash -c '
  for filepath do
    #echo "showing $filename"
    debpath=$(echo "$filepath" | sed "s|^rootfs||")
    # Ignore lib/modules/*/(build|source)
    if echo $debpath | grep -q "^lib/modules/(build|source)"; then continue; fi
    perms=$(stat "$filepath" -c "%a")
    echo "$filepath:$debpath:$perms" >> $injection_filelist
  done
' {} +
echo "rootfs/boot/Image:/boot/Image" >> $injection_filelist

original=$(dpkg --info my_debs/deb_origin/nvidia-l4t-kernel_*_arm64.deb | grep "Version:" | cut -d' ' -f3)
custom_version="rebuild0"

#echo "my_debs/deb_origin/nvidia-l4t-kernel_${original}_arm64.deb"

tools/Debian/nvdebrepack.sh \
        -v $custom_version \
        -f $injection_filelist \
        -m "Replace kernel image." \
        -n "Linus <linus@cosmos-ink.net>" \
        my_debs/deb_origin/nvidia-l4t-kernel_${original}_arm64.deb

# Patch deps on child packages
tools/Debian/nvdebrepack.sh \
        -d nvidia-l4t-kernel=${original}+$custom_version \
        -v $custom_version \
        -n "NVIDIA Corporation <linux-tegra-bugs@nvidia.com>" \
        my_debs/deb_origin/nvidia-l4t-kernel-dtbs_${original}_arm64.deb
tools/Debian/nvdebrepack.sh \
        -d nvidia-l4t-kernel=${original}+$custom_version \
        -v $custom_version \
        -n "NVIDIA Corporation <linux-tegra-bugs@nvidia.com>" \
        my_debs/deb_origin/nvidia-l4t-kernel-headers_${original}_arm64.deb

As you see above, I also replaced all the modules in rootfs/lib/modules with the newly built ones. I did some checks to ensure everything is more or less the same and then installed my new kernel, kernel-headers and kernel-dbts packages and rebooted it afterwards.

scrnlog (197.8 KB)

On restart, the kernel didn’t boot at first. After booting it with a custom entry (just without the -quiet kernel option) I got the errors starting at 1676.

I never built a lot of kernels and am not sure what the issues is here. It says “Not tainted” and something about “invalid cgroup_subsys …”. I’m really not sure if this is some licensing issue or something else (“Boot logo display failed…” is normal since I’m using it without a monitor).

I build the kernel from the tag tegra-l4t-r32.5 and used the commended tegra_defconfig with the addition of KVM and NVFs v4. I’m not sure whether the officially distributed kernel has some missing options. .config (163.5 KB)

The output of /proc/config.gz of my last booting (official) kernel:
config.gz (37.1 KB)

Can anyone lead me in the right direction here? I’m rolling back the kernel rn but would still like to have the ability to configure the kernel myself for additional features.

Solved it. It was basiciially my fault but I also didn’t know it better.

I ran into similar issues when trying to revert my changes:

  • Changed /lib/modules/ back to the original
  • Changed /boot/Image back by flashing using the recovery

After some digging, weird errors like kernel panics because of init, and so on, it spotted the error: I’m using an external rootfs and assumed that /boot/ as mounted to the internal APP partition that is used for booting. There are in fact to Images. One on the internal APP partition (which is flashed with ./flash.sh) and one on my external rootfs.

I seemed to have a mismatch because of that and mounting my external rootfs on my pc and changing boot/Image back to the old one fixed the issues and made me able to use the old kernel again. The device booted again!

I then repeated applying my modified /boot/Image, /lib/modules removed the Image.sigfile for good measure and also flashed the new Image (and .gz) with the driver package (./flash.sh). The new kernel booted now and I can confirm that KVM is enabled now!

My guess would be that when the kernel booted and mounted the rootfs, it read the other kernel now from the external rootfs which mismatched and caused random memory problems since new reads would now serve an entirely different file. I also guess that the Image has to match it’s modules and some error may have related to that as well.

TL;DR:
Needed to change BOTH boot/Image files when using an external rootfs (e.g. put the new Image and Image.gz into Linux_for_Tegra/kernel/ and do a ./flash.sh ... to apply it to the internal APP partition).

Not sure about also doing the Image.gz and removing the Image.sig file, but they cant harm it seems. Mileage may vary in production.

One question remains for me: Why didn’t the modified nvidia-l4t-kernel package not automatically also update the Image in the APP partition? I think I got at least one kernel update that way already and it didn’t brick it.

Did I miss something here? Like something with a signature, failing package installation script or whatever??

hello luna_devnvidia,

here’re two approaches to load the kernel image, (1) kernel partition (2) root file system,
also, it’s CBoot functionality to includes a default booting scan sequence by /boot/extlinux/extlinux.conf.
for example,
it’ll load kernel binary file from the LINUX entry, otherwise, kernel binary is loaded from the kernel partition.

the binary files need to the signed/encrypted before flash into partitions, it’s by default assign all zero to encrypt the binary, you could perform flash.sh to execute the encrypt process and burning the binaries. in the other hand, you may enable --no-flash to generate the sign/encrypt file locally.
for example, $ sudo ./flash.sh --no-flash -r -k kernel jetson-xavier mmcblk0p1.

the binary file do NOT need sign/encrypted if the kernel binary is loaded from the LINUX entry. you may simply copy the customize Image to the target and specify the correct LINUX file path within extlinux.conf. the new kernel binary will be load after you reboot the target.