Can't boot orin nx after flashing custom Kernel

Hello:
I am trying to built custom linux kernel based on JP6 on Orin Nx dev kit.
After changing the configs (tried to enable the kdbg features) and flashing the orin nx dev kit, the orin nx is not able to boot.
The SDK is able to flash and boot if I didn’t modify anything.

The way I compile my kernel is following [here(Kernel Customization — NVIDIA Jetson Linux Developer Guide 1 documentation)]

  1. Download the source code in here
  2. Build the kernel, oot modules, and dtbs:

Unzip public_source.tbz2 and mv them to seperate folder to avoid permission issues.

$ tree
.
├── kernel_oot_modules_src.tbz2
├── kernel_src.tbz2
└── nvidia_kernel_display_driver_source.tbz2
.

Tar them:

$ tar xf kernel_src.tbz2
$ tar xf kernel_oot_modules_src.tbz2
$ tar xf nvidia_kernel_display_driver_source.tbz2

Modify defconfig (config file attached, all modifications are at the bottom of the file) and ready to build:

$ ls
generic_rt_build.sh  hwpm    kernel_oot_modules_src.tbz2  kernel_src.tbz2  nvbuild.sh         nvdisplay     nvgpu                                     nvidia-oot
hardware             kernel  kernel_src_build_env.sh      Makefile         nvcommon_build.sh  nvethernetrm  nvidia_kernel_display_driver_source.tbz2
$ export CROSS_COMPILE=/home/ceslab-onyx/jp6KernelSource/toolchain/bin/aarch64-buildroot-linux-gnu-
$ nano kernel/kernel-jammy-src/arch/arm64/configs/defconfig
$ make -C kernel

After the logs said “Kernel source compiled successfully.”, install the kernel:

$ export INSTALL_MOD_PATH=/home/ceslab-onyx/jetpack6/Linux_for_Tegra/rootfs
$ sudo -E make install -C kernel
$ cp kernel/kernel-jammy-src/arch/arm64/boot/Image \
  /home/ceslab-onyx/jetpack6/Linux_for_Tegra/kernel/Image

Building Nvidia out-of-tree modules and dtbs:

$ cd source

$ export CROSS_COMPILE=/home/ceslab-onyx/jp6KernelSource/toolchain/bin/aarch64-buildroot-linux-gnu-
$ export KERNEL_HEADERS=$PWD/kernel/kernel-jammy-src
$ make modules

$ export INSTALL_MOD_PATH=/home/ceslab-onyx/jetpack6/Linux_for_Tegra/rootfs
$ sudo -E make modules_install

DTBs:

$ cd source

$ export CROSS_COMPILE=/home/ceslab-onyx/jp6KernelSource/toolchain/bin/aarch64-buildroot-linux-gnu-
$ export KERNEL_HEADERS=$PWD/kernel/kernel-jammy-src
$ make dtbs

$ cp nvidia-oot/device-tree/platform/generic-dts/dtbs/* \
    /home/ceslab-onyx/jetpack6/Linux_for_Tegra/kernel/dtb/
  1. Flash Orin NX
    Boot the NX to recovery mode and run the cmd:
$ cd Linux_for_tegra
$ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -p "-c ./bootloader/generic/cfg/flash_t234_qspi.xml" -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit nvme0n1p1

After flashing, NX will halt after the dmesg said:

...
[    2.624032] hub 2-1:1.0: 4 ports detected
[    2.728491] usb 1-3: new full-speed USB device number 3 using tegra-xusb
[    5.336678] pcie_tegra194: disagrees about version of symbol module_layout
[    5.338233] phy_tegra194_p2u: disagrees about version of symbol module_layout
[    5.340332] nvme_core: disagrees about version of symbol module_layout
[    5.342557] typec: disagrees about version of symbol module_layout
[    5.344551] typec: disagrees about version of symbol module_layout
[    5.346768] typec: disagrees about version of symbol module_layout
[    5.348616] tegra_xudc: disagrees about version of symbol module_layout
[   15.469723] ERROR: nvme0n1p1 not found

Full logs from NX serial uart and logs from initrd tools are also attached.

Using the same sdk, I can flash AGX orin dev kit and boot up successfully. But dmesg will also print disagrees about version of symbol module_layout, not sure if it’s the root cause or not.

Please help. Many thanks.

flash_1-11.4_0_20240115-140440.log (51.7 KB)
defconfig.txt (31.0 KB)
flashNxSerialLogs.txt (136.7 KB)

Hi,

Did you change the kernel suffix name?
What do you get with:

strings Linux_for_Tegra/kernel/Image | grep ‘Linux version’

Hi DaveYYY:
Not sure if I changed it accidentally. Here is what it showed:

Linux version 5.15.122-tegra (ceslab-onyx@ceslab-onyx-server) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2022.08) 11.3.0, GNU ld (GNU Binutils) 2.38) #5 SMP PREEMPT Thu Jan 11 16:56:38 CST 2024 ()

Thank you.

Then what do you have under Linux_for_Tegra/rootfs/lib/modules/?
Is the folder also named 5.15.122-tegra?

Yes, Linux_for_Tegra/rootfs/lib/modules/5.15.122-tegra folder exists.

Hi,

This is a known issue on JetPack 6 with kernel building.
We are tracking it internally.

This is because we need separate kernel drivers to mount NVMe/USB as rootfs during booting, and these drivers has to be present in the initrd image. However, when you build kernel/kernel modules, the initrd image is not updated, so the value of module_layout in kernel modules in the stock initrd image released by NVIDIA does not match the new kernel image you built yourself.

For a temporary solution, please follow this method to unpack the initrd image:
https://docs.nvidia.com/jetson/archives/r36.2/DeveloperGuide/SD/FlashingSupport.html#modifying-jetson-ram-disk

(It’s Linux_for_Tegra/rootfs/boot/initrd.)
Then replace files under /lib/modules/ in the unpacked folder of the initrd image with ones you built yourself, repack it, and flash again.

I guess if you keep the kernel config unchanged, it will succeed booting.

I’m doing something very similar to you, following the same steps, however I don’t see the changes done to the kernel/kernel-jammy-src/arch/arm64/configs/defconfig reflected in the flashed image. I am trying to add CONFIG_GPIO_SYSFS=y to be able to set GPIO pins via /sys/class/gpio. After I add the flag to the defconfig I dont find it in kernel/kernel-jammy-src/.config after make -C kernel, and after flashing I still don’t have /sys/class/gpio.

This is some of the output of make -C kernel.

make[1]: Entering directory '/home/kilter/projects/kernel_custom/Linux_for_Tegra/source/kernel/kernel-jammy-src'
*** Default configuration is based on 'defconfig'
#
# configuration written to .config
#

I also tried creating a new clean project-folder and got some more output, though I still dont have the /sys/class/gpio folder… I did exactly the same as in the original post, though I had no issues flashing.

@hvbotten
GPIO sysfs is deprecated on real-36. Please use libgpiod instead.
Anyway, don’t mix up unrelated issues. File a new topic for your own questions.

2 Likes

Hi hvbotten:
Yes, modify defconfig reflected in the flashed image. I tried on agx orin dev kit.
Though module_layout problem still happened, at least I can see the kdbg config changed on orin dev kit.

Hi DaveYYY:
module_layout problem is solved. No such error message was printed out.
However, the flash will stop due to a timeout problem.

Here is what I tried:

  1. re-pack the initrd image:
$ sudo cp Linux_for_Tegra/rootfs/boot/initrd ~/jetpack6/initrdImg/
$ mkdir ~/jetpack6/initrdImg/unzipedInitrd
$ cd ~/jetpack6/initrdImg/unzipedInitrd
$ sudo gunzip -c /home/ceslab-onyx/jetpack6/initrdImg/initrd | cpio -i

Use modules in Linux_for_Tegra/rootfs/lib/modules :

$ cd unzipedInitrd/lib/modules/
$ cp -r ~/jetpack6/Linux_for_Tegra/rootfs/lib/modules/5.15.122-tegra .
$ cd ~/jetpack6/initrdImg/unzipedInitrd
$ find . | cpio -H newc -o | gzip -9 -n > ../initrd
  1. Copy initrd back to folder:
$ cd ..
$ sudo cp initrd Linux_for_Tegra/rootfs/boot/initrd
  1. Flash again, using cmd. I tried to flash external or nvme0n1p1. Both method were failed due to timeout.
sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -p "-c ./bootloader/generic/cfg/flash_t234_qspi.xml" -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit nvme0n1p1

Here are the logs from serial and flash tool.
Thank you.
flash_1-11.4_0_20240116-105754.txt (9.3 KB)
flashNxSerialLogs2.txt (64.3 KB)

Hi,

Please try connecting any kinds of USB 2.0 devices to the carrier board during flashing.
Also use a USB 2.0 cable for flashing in case it’s not.

I connected a usb keyboard. Looks the same.
Here’s the logs.

flash_1-11.4_0_20240116-114608.txt (9.3 KB)
flashNxSerialLogs_usb2.txt (65.0 KB)

Hi DaveYYY:
Thank you for your help. I think the root cause is:

  1. Module_layout issue.
  2. My SDK somehow is broken.
    I used SDK manager to download the SDK again, then do the steps again. Now, the issue were solved and system can be booted.

Thank you again!


Edit: Using the same SDK to flash agx orin dev kit requires changing the partition layout. Changing initrd causes the size of recovery img changed too, and oversized for original partition layout.

You might see the error message like:

[ 456.5187 ] Writing partition recovery with recovery.img [ 137631744 bytes ]                                                                                                                                    
[ 456.5216 ] 0000000054540204: E> NV3P_SERVER: Accessing offset 137631744 after boundary partition size 83886080
Error: Return value 4

In flash.sh line 3025, the size of recovery.img is set to 80M, which is smaller than the newer generated recovery.img, leads to the error.

In my case, I modify the default value (80M) to a bigger value (150M) to avoid this error.

$ nano Linux_for_Tegra/flash.sh
# Line 3025:
REC_SIZE_DEF=150000000
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.