Jetson Orin NX 16GB with custom carrier board boot failed

Hi Nvidia,
Jetson Orin NX 16GB with custom carrier board boot failed.

I use this command to flash the device:
sudo BOARDID=3767 BOARDSKU=0000 FAB=300
./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1
-c tools/kernel_flash/flash_l4t_t234_nvme.xml -p “-c bootloader/generic/cfg/flash_t234_qspi.xml”
–showlogs --network usb0 p3509-a02-p3767-0000 external
flash log: Linux_for_Tegra\initrdlog\flash_1-6_0_20241223-172847.log
serial log: serial2.log

Also test this command:
sudo BOARDID=3767 BOARDSKU=0000 FAB=300 ADDITIONAL_DTB_OVERLAY_OPT=BootOrderNvme.dtbo
./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1
-c tools/kernel_flash/flash_l4t_t234_nvme.xml -p “-c bootloader/generic/cfg/flash_t234_qspi.xml”
–showlogs --network usb0 p3509-a02-p3767-0000 nvme0n1p1
flash log: Linux_for_Tegra\initrdlog\flash_1-6_0_20241223-101934.log
serial log: serial.log

The M.2 SSD is connected to PCIE C9 interface. Below is PCIe@140c0000 dts Configuration:
pcie@140c0000 {

status = “okay”;
iommus = <0x04 0x1f>;
vddio-pex-ctl-supply = <0xe7>;
phys = <0x101>;
phy-names = “p2u-0”;
};

The logs uploaded.
flash_1-6_0_20241223-101934.log (50.0 KB)
flash_1-6_0_20241223-172847.log (49.9 KB)

serial.log (27.4 KB)
serial2.log (25.4 KB)
rootfs-boot.gz (24.9 MB)

Could you please help me check what’s going on.

Use “JetPack 6.0 Jetson” and “Linux 36.3” to build Images.


Maybe initrd is not work, I can see nvme device but the kernel is not booted.

You may need to clarify which jetpack release you are using…

Also, please do not use such flash command in bring up custom board… let the flash tool read the EEPROM from your SOM.

sudo BOARDID=3767 BOARDSKU=0000 FAB=300

If you are not familiar how everything works here, just follow our guidance and answer the questions first.

Hello WayneWWW,
We use “JetPack 6.0 Jetson” and “Linux 36.3” to build Images. Our carrier board does not have any EEPROM, encountered the error of stopping the flash writing, so the variable was used to define the board sub model.

After removing variables, the kernel still fails to start, and rootfs can be attached by manual.
use this flash cmd:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 \
  -c tools/kernel_flash/flash_l4t_t234_nvme.xml -p "-c bootloader/generic/cfg/flash_t234_qspi.xml" \
  --showlogs --network usb0 p3509-a02-p3767-0000 internal


serial_cold_boot.log (78.6 KB)

The terminal on the host side shows that the writing is successful:

[ 213]: l4t_flash_from_kernel: Successfully flash the qspi
[ 213]: l4t_flash_from_kernel: Flashing success
[ 213]: l4t_flash_from_kernel: The device size indicated in the partition layout xml is smaller than the actual size. This utility will try to fix the GPT.
Flash is successful
Reboot device
Cleaning up...
Log is saved to Linux_for_Tegra/initrdlog/flash_1-6_0_20241224-141316.log

What’s the next step?

Thanks.

There are lots of mistake from what you are talking about here

  1. Our carrier board does not have any EEPROM, encountered the error of stopping the flash writing, so the variable was used to define the board sub model.

Yes, everyone’s custom board has no EEPROM. But the SOM has a EEPROM. The info is read from the SOM. I didn’t tell anything about your board EEPROM because it does not matter. You should not need to define variable by yourself.

  1. The log you provided is not a flash log. It is a boot log from our default recovery image. When you see such log, it means previously your boot attempts are all failed so it went into this. I don’t care about this log either because it is same to everyone. Please reflash your board again and share us the full flash log from both host and UART.

  2. Please do not flash with "p3509-a02-p3767-0000 " anymore. That is not fully supported on rel-36 anymore. Use the Orin Nano devkit one.

Should I use “jetson-orin-nano-devkit.conf” or “jetson-orin-nano-devkit-nvme.conf” to flash?

This SOM is Orin NX 16GB, the PCIE C9 connect to SSD.

Please help to confirm.

jetson-orin-nano-devkit.conf

用這個就可以了

This SOM is Orin NX 16GB, the PCIE C9 connect to SSD.

請問要confirm什麼?

以下是用jetson-orin-nano-devkit配置刷写的结果,辛苦帮忙看下怎么解决,感谢。

$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 033: ID 0955:7323 NVIDIA Corp. APX
Bus 001 Device 003: ID 046d:c31c Logitech, Inc. Keyboard K120
Bus 001 Device 002: ID 046d:c077 Logitech, Inc. M105 Optical Mouse
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 \
  -c tools/kernel_flash/flash_l4t_t234_nvme.xml -p "-c bootloader/generic/cfg/flash_t234_qspi.xml" \
  --showlogs --network usb0 jetson-orin-nano-devkit internal

host_log.txt (307.7 KB)
initrdlog-flash_1-6_0_20241224-165216.log (49.9 KB)
serial_log.txt (346.4 KB)

你的問題應該是你用rt kernel之後pcie driver load不到
/lib/module底下有加入對應的路徑嗎? 沒加的話你所有OOT kernel modules都沒辦法load.

開不起來的那幾次pcie driver log都沒出現.

是这些文件吗?pcie-tegra194.ko这些都有

Linux_for_Tegra/rootfs$ sudo find -name "*ko" |grep pcie
Linux_for_Tegra/rootfs/lib/modules$ sudo find -name "*ko" |grep pcie
./5.15.136-rt-tegra/kernel/drivers/net/wireless/marvell/mwifiex/mwifiex_pcie.ko
./5.15.136-rt-tegra/kernel/drivers/pci/controller/pcie-brcmstb.ko
./5.15.136-rt-tegra/kernel/drivers/pci/controller/dwc/pcie-tegra194.ko
./5.15.136-rt-tegra/kernel/drivers/pci/controller/pcie-rockchip-host.ko
./5.15.136-rt-tegra/kernel/drivers/phy/rockchip/phy-rockchip-pcie.ko
./5.15.136-rt-tegra/updates/drivers/net/ethernet/nvidia/pcie/tegra_vnet.ko
./5.15.136-rt-tegra/updates/drivers/misc/nvscic2c-pcie/nvscic2c-pcie-epc.ko
./5.15.136-rt-tegra/updates/drivers/misc/nvscic2c-pcie/nvscic2c-pcie-epf.ko
./5.15.136-rt-tegra/updates/drivers/misc/tegra-pcie-dma-test.ko
./5.15.136-rt-tegra/updates/drivers/pci/controller/pcie-tegra-vf.ko
./5.15.136-rt-tegra/updates/drivers/pci/controller/tegra-pcie-edma.ko
./5.15.136-tegra/kernel/drivers/net/wireless/marvell/mwifiex/mwifiex_pcie.ko
./5.15.136-tegra/kernel/drivers/pci/controller/pcie-brcmstb.ko
./5.15.136-tegra/kernel/drivers/pci/controller/dwc/pcie-tegra194.ko
./5.15.136-tegra/kernel/drivers/pci/controller/pcie-rockchip-host.ko
./5.15.136-tegra/kernel/drivers/phy/rockchip/phy-rockchip-pcie.ko
./5.15.136-tegra/updates/drivers/net/ethernet/nvidia/pcie/tegra_vnet.ko
./5.15.136-tegra/updates/drivers/misc/tegra-pcie-dma-test.ko
./5.15.136-tegra/updates/drivers/pci/controller/tegra-pcie-edma.ko

能請你把initrd裡面的部份也更新嗎

或是另外一種debug的方式是你先用usb之類的開機然後再來確認為什麼pcie driver沒有load.
或是先確認最基本的, 如果你不用rt kernel, 是不是都沒有這些問題.

禁用rt kernel配置后,刷写能启动。
有点不太相信。。
要使能RT怎么办?

也沒什麼相不相信的… 就只是kernel module沒load起來而已.

請你確認一下initrd跟kernel rootfs都要有這些新的build出來的OOT kernel modules.