Jetson orin nx : Stopping while booting

Hi,
The SSD appears to be recognized when I put it on my custom board and pressed F11 in the bottom step to access the boot manager.
Here, if you select an SSD, the boot log will appear as follows and no longer proceed.
The SSD is a flash of Linux on the carrier board, and it has been confirmed that it boots normally.
I want to know what the problem is.

Jetson UEFI firmware (version 3.1-32827747 built on 2023-03-19T14:56:32+00:00)
ESC to enter Setup.
F11 to enter Boot Manager Menu.
Enter to continue boot.
** WARNING: Test Key is used. **
.

        /-----------------------------------------------------\
        |             Please select boot device:              |
        |-----------------------------------------------------|
        |UEFI PM991a NVMe Samsung 256GB S660NE1NC01591 1      |
        |UEFI HTTPv4 (MAC:48B02DEAF01F)                       |
        |UEFI PXEv6 (MAC:48B02DEAF01F)                        |
        |UEFI PXEv4 (MAC:48B02DEAF01F)                        |
        |UEFI HTTPv6 (MAC:48B02DEAF01F)                       |
        |Enter Setup                                          |
        |UEFI Shell                                           |
        |-----------------------------------------------------|
        |              ^ and v to move selection              |
        |             ENTER to select boot device             |
        |                     ESC to exit                     |
        \-----------------------------------------------------/

L4TLauncher: Attempting GRUB Boot
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel…
EFI stub: Generating empty DTB
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services and installing virtual address map…
��I/TC: Secondary CPU 1 initializing
I/TC: Secondary CPU 1 switching to normal world boot
I/TC: Secondary CPU 2 initializing
I/TC: Secondary CPU 2 switching to normal world boot
I/TC: Secondary CPU 3 initializing
I/TC: Secondary CPU 3 switching to normal world boot
I/TC: Secondary CPU 4 initializing
I/TC: Secondary CPU 4 switching to normal world boot
I/TC: Secondary CPU 5 initializing
I/TC: Secondary CPU 5 switching to normal world boot
I/TC: Secondary CPU 6 initializing
I/TC: Secondary CPU 6 switching to normal world boot
I/TC: Secondary CPU 7 initializing
I/TC: Secondary CPU 7 switching to normal world boot

Hi kin4057,

How did you confirm that it can boot successfully before?
Do you get this NVMe SSD from another board?

It seems NVMe SSD is the first priority boot device in your configuration.
Could it boot as expected if you don’t enter into boot manager?

How did you confirm that it can boot successfully before?
Do you get this NVMe SSD from another board?

We are using Avermedia’s D131L carrier board. We have installed Linux on the SSD through flash. Upon booting, we confirmed the successful completion of the boot process through the GUI by observing the Ubuntu desktop background.

Referencing the above carrier board, we have manufactured a new custom carrier board, and the result mentioned above was obtained after booting with the SSD mounted on it.

It seems NVMe SSD is the first priority boot device in your configuration.

Yes.

Could it boot as expected if you don’t enter into boot manager?

Even without entering the boot manager, the result is as described above.

Okay, so the issue is not relating to boot manager.

Do you put the NVMe on your custom carrier board and run the command to flash it?

Do you flash the QSPI of Orin NX module on your custom board before use?

It seems there is a bit of confusion.
The board used for flashing is the D131L board from Avermedia, and a Linux-flashed SSD is used in my custom board.
I followed the manual and the attached photos for the flashing process.
Additionally, I tried to flash the SSD via Recovery mode and USB on my custom board, but the SSD is not being flashed properly.
The last error message I received was “Error: Could not stat device /dev/nvme0n1 - No such file or directory.”

It seems you have 2 boards as following.
A. Orin NX+D131L board+NVMe_A
B. Orin NX+your custom board+NVMe_B

For A, it can be flashed successfully.
For B, it is flashed failed with “Error: Could not stat device /dev/nvme0n1”

Is my understanding correct?

Yes, exactly.

I attempted to flash a SSD for device B (the flash failed), but when I mounted it to the Desktop, I discovered that several partitions had been created on the NVMe drive.
In reality, it appears that the process of accessing the NVMe and creating partitions did take place, but for some unknown reason, it failed.

To summarize:
When a successfully flashed SSD from device A is mounted to device B for booting, boot failure occurs.
When a formatted SSD is mounted to device B for flashing, the flash fails, but files are written onto the SSD.

Additionally,
Looking at the boot error logs, it seems there is a problem with the receiving part of the PCIe communication interface. I am curious if the following action is possible for debugging purposes:
Modifying the PCIe speed from Gen3 to Gen1 or 2 in the reference files used for flashing, such as the .dtb file.
Since there appears to be a communication issue, I want to try flashing at a lower speed. The NVMe currently in use supports Gen3.

We don’t suggest moving the SSD from device A to device B to check the booting since it may relate to QSPI in the module.

I would like to check the failed flash log for details. It may format/partition the NVNe successfully but failed in some point during flash. We should check the log first.

The board and PC are connected via USB 2.0 B type and UART.
When performing the flash on the PC, the operational status on the board was also recorded through UART communication.
I have attached the flash logs from the PC and Board here.
PC Flash log.txt (253.7 KB)
Board flash log.txt (96.2 KB)

From board:

[  101.636096] NFS: state manager: check lease failed on NFSv4 server fc00:1:1:0::1 with error 13

From host:

exportfs: /etc/exports [2]: Neither 'subtree_check' or 'no_subtree_check' specified for export "192.168.0.*:/home/user/nfs".
  Assuming default behaviour ('no_subtree_check').
  NOTE: this default has changed since nfs-utils version 1.0.x

Have you enabled/configured the board booting from NFS?

You current flash command as following:

$ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" --showlogs --network usb0 p3509-a02+p3767-0000 internal

Please try using the following command instead:

$ sudo ADDITIONAL_DTB_OVERLAY_OPT="BootOrderNvme.dtbo" ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 \
-c tools/kernel_flash/flash_l4t_t234_nvme.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \
--showlogs --network usb0 p3509-a02+p3767-0000 internal

I have attached the log results from executing the command you provided.
It was successful on the PC, but looking at the boot device on the Board, it seems the SSD is not recognized.
I have also attached the log results from booting with a previous version of the sub-board, which seems to indicate something was incorrectly flashed onto the SSD.

SSD Flash PC log_240502.txt (263.0 KB)
SSD Flash board log_240502.txt (30.7 KB)
SSD Flash SUB board log_240502.txt (86.7 KB)

From your flash log, it seems the NVMe is flashed successfully.
From the serial console log from the board, it seems NVMe is not detected in UEFI so that it can not boot from it.

Is the issue specific to the custom carrier board? (do you have the devkit to verify?)
Are you using the custom BSP package to flash the board or you are using the BSP package from our official release?

Are you using the custom BSP package to flash the board or you are using the BSP package from our official release?

Logs tagged with “240502” indicate that the BSP package from the Avermedia carrier board guide(link 33p) was used.

Just in case, I also downloaded the BSP package and Sample Root Filesystem provided by the official Nvidia release and tried them out.
The result was a failure to flash.
I’m attaching the related logs.
SSD Flash PC log_240507.txt (247.1 KB)
SSD Flash board log_240507.txt (34.0 KB)

I’m currently checking the differences between these two BSPs.
I think the BSP provided in the guide and the Nvidia release BSP seem to be using the same download path.


https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonOrinNxNanoSeries.html#eeprom-modifications

After setting the EEPROM disable option as indicated in the above link and retesting, the results are identical to those with tag 240502.

For the custom carrier board, please use custom BSP package and also the flash instruction from your vendor.

You should not use p3509-a02+p3767-0000 as the board config for your custom carrier board.
There should be the custom board config for your custom board.