Intermittent Flashing failure in orin nx

Hi,
I am using Nvidia Orin NX custom carrier board, and using L4T 36.4.0.

I am using below 3 commands to flash the board,
time sudo ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs -p “-c bootloader/generic/cfg/flash_t234_qspi.xml” --no-flash --network usb0 jetson-orin-nx internal

time sudo ROOTFS_AB=1 ROOTFS_ENC=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --no-flash --external-device nvme0n1p1 --user-data-dir user_data -i ./sym2_t234.key -c ./tools/kernel_flash/flash_l4t_t234_nvme_rootfs_ab_enc.xml --external-only --append --network usb0 jetson-orin-nx external

time sudo ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --network usb0 --flash-only

Most of the time i am able to flash the board correctly. But sometimes i see the below issue when running the 3rd command.

Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Waiting for target to boot-up…
Timeout
Device failed to boot to the initrd flash kernel. Please retrive the serial log during flashing to debug further.
Cleaning up…
Command exited with non-zero status 1
14.47user 5.79system 2:27.84elapsed 13%CPU (0avgtext+0avgdata 196216maxresident)k
0inputs+570264outputs (29major+2584356minor)pagefaults 0swaps

full logs:

nvidia_flash_uart.txt (49.0 KB)

nvidia_flash_host.txt (10.4 KB)

Can you provide any inputs on this issue?

Thanks in advance.

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, it appears that you are experiencing issues with flashing your custom Nvidia Orin NX carrier board using L4T 3640. You have provided three commands that you use to flash the board, but unfortunately, the commands themselves are not included in the provided text.

However, I can offer some general guidance and potential solutions to help you troubleshoot the issue.

  1. Verify the flashing commands: Double-check that the three commands you are using are correct and match the recommended flashing procedure for your specific board and L4T version. You can refer to the Nvidia documentation or the L4T user guide for guidance.
  2. Check the USB connection: Ensure that the USB connection between your host computer and the Jetson Orin NX module is stable and secure. Try using a different USB port or cable to rule out any issues with the connection.
  3. Verify the module’s recovery mode: Use the lsusb command to verify that the Jetson module is in Force Recovery Mode. If the module is not in recovery mode, you may need to put it into recovery mode manually.
  4. Check the log files: Analyze the log files (nvidiaflashuart.txt and nvidiaflashhost.txt) to identify any error messages or patterns that may indicate the cause of the issue.
  5. Power cycle the board: Try power cycling the board and retrying the flashing process.
  6. Update the L4T version: If you are using an older version of L4T, try updating to the latest version to see if it resolves the issue.
  7. Verify the carrier board design: Ensure that your custom carrier board design is compatible with the Nvidia Orin NX module and meets the necessary requirements for flashing.

In comparison to other Jetson modules, the Orin NX has specific requirements for flashing, such as the need for a high-speed USB connection and a stable power supply. Additionally, the Orin NX has a different pinmux configuration compared to other Jetson modules, which may require specific settings in the pinmux.dtsi file.

To further assist you, I would need more information about the specific error messages or issues you are experiencing during the flashing process. Please provide more details about the problem, and I will do my best to help you troubleshoot and resolve the issue.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi

Please try this patch and see if it works.

--- Linux_for_Tegra/tools/kernel_flash/l4t_initrd_flash_internal.sh
+++ Linux_for_Tegra/tools/kernel_flash/l4t_initrd_flash_internal.sh
@@ -320,6 +320,7 @@
                     sysctl "net.ipv6.conf.${REPLY}.disable_ipv6"
                     echo "IPv6 is disabled. Please enable ipv6 to use this tool"
                 fi
+                nmcli dev set "${REPLY}" managed no
                 ip a add fc00:1:1:"${device_instance}"::1/64 dev "${REPLY}"
                 ip a add fe80::2/64 dev "${REPLY}"
                 IP_SET=0

Thanks

Hi, thanks for the reply.
I am unable to reproduce the issue. I will work on it today different setups, and see if i can test this fix.

Hi @DavidDDD ,
I was able to recreate and test the fix in another setup. Looks like the board booted upto initramfs. But still the timeout error is not solved.

Please find the UART logs, timeout_2.txt (38.7 KB).

let me know if anything else is required from my side.

Attaching the lsmod from ramfs,

bash-5.1# lsmod
Module Size Used by
usb_f_rndis 32768 2
u_ether 32768 1 usb_f_rndis
spi_tegra210_quad 32768 0
tegra_mce 28672 0
stusb160x 20480 0
nvethernet 1175552 0
nvpps 32768 1 nvethernet
ipv6 503808 26
tegra_xudc 45056 0
ucsi_ccg 28672 0
typec_ucsi 36864 1 ucsi_ccg
typec 61440 2 stusb160x,typec_ucsi
libcomposite 65536 10 usb_f_rndis
pwm_fan 20480 0
pwm_tegra 20480 1
tegra_bpmp_thermal 16384 0
nvme 49152 0
nvme_core 106496 1 nvme
pcie_tegra194 40960 0
phy_tegra194_p2u 16384 9
r8168 524288 0

Hi @DavidDDD ,
The only difference i am seeing between the successful flashing logs and when it is having issue are the below lines,
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-5.1# [ 10.739931] tegra-xudc 3550000.usb: EP 5 (type: intr, dir: in) enabled
[ 10.739951] tegra-xudc 3550000.usb: EP 3 (type: bulk, dir: in) enabled
[ 10.739963] tegra-xudc 3550000.usb: EP 2 (type: bulk, dir: out) enabled
[ 10.740023] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready

Is there any reason why this is not coming?

Hi @DavidDDD ,
We are still facing this issue and unable to flash the SOMs. Kindly let us know what to do further.

Hi @DaneLLL , @DavidDDD
Can you help in fixing this issue?
I am trying to flash 30soms and ssds with my customised bsp based on L4T36.4.
I am using same setup for flashing all these soms and using the above mentioned commands.

Out of these 30soms, 2 are having the above mentioned issue, and i am unable flash these soms.

Hi,
We would suggest try other USB type-A to type-C cable to connect Orin NX and host PC. The log shows Orin NX does not enumerated to device mode when booting to initramfs, triggering the timeout.

Hi @DaneLLL ,
I have been using the same USB type-c cable , with same carrier board and host pc for the other 28 som-ssd pairs. Only with these 2 SOMs, this issue is observed.
Is there any nvidia preferred type-A to type-C cable? i will order it and try.

Hi,
Would suggest try the cable which passes USB-IF certificate.

Hi @DaneLLL, @DavidDDD ,

I am continuing the investigation from where Ashik left off in this thread.

We did some additional debugging on the target side to understand why the initrd flash shows timeout.

Waiting for target to boot-up…

Waiting for target to boot-up…

Timeout

From the target logs we observed that the USB device controller initializes correctly and EP0 is enabled:

tegra-xudc 3550000.usb: EP 0 (type: ctrl, dir: out) enabled

However, in the failing cases the enumeration does not proceed further. The bulk endpoints required for the RNDIS interface are not enabled and the USB state remains as default with speed as UNKNOWN as shown below.

/sys/class/udc/3550000.usb/current_speed = UNKNOWN

/sys/class/udc/3550000.usb/state = default

Because of this, the usb0 network interface required forl4t_initrd_flashnever becomes active.

We also checked the USB role on the target and it reported as “device”.

cat /sys/class/usb_role/*/role

device

So, the system is correctly in USB device mode.

Additionally, the module does enumerate correctly in recovery mode, where the host detects it as NVIDIA Corp, which indicates that the USB connection between host and target is functional.

Another One observation is related to cable re-plug event - In the working SOM, unplugging and plugging the USB cable from the host side produces logs similar to:

[ 10.739951] tegra-xudc 3550000.usb: EP 3 (type: bulk, dir: in) enabled

[ 10.739963] tegra-xudc 3550000.usb: EP 2 (type: bulk, dir: out) enabled

[ 10.740023] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready

However, in the non-working SOM, unplugging and plugging the USB cable does not generate any log messages, and the USB state remains unchanged.

Another important point is that only the SOM is swapped during our tests. The following components remain exactly the same between the Working and Non-working case:

  • Carrier board

  • Device tree

  • Software / root filesystem

  • USB cable and host PC

We have ordered a USB-IF certified cable and will update once we receive it and complete the testing. In the meantime, if you have any insights or suggestions based on the observations shared above, please let us know.

Hi @DaneLLL , @DavidDDD ,

Do you have any suggestions to try out for the above issue?

Hi,
Beside using USB-IF certified cable, please also make sure you disable USB autosuspend to the host PC:
Jetson AGX Orin FAQ