Problem Background:
We have installed the Jetson Orin Nano core module on a custom-made baseboard, providing only a 512G solid-state drive as the boot device for the core module.
Problem Description: We have found that one of the core modules has the following issue: after multiple reboots, the core module enters recovery boot mode and cannot work properly; we can temporarily resolve this issue by modifying the OS chain A status variable in the BIOS, but the problem reoccurs after several reboots; currently, this issue has only been observed on one Jetson Orin Nano.
Log Description: The following is a set of serial print logs from the Jetson Orin Nano core module. This set of logs records three reboot processes (you can distinguish the boundaries of these three log segments by searching for “Rebooting system”). In the first two reboots, the core module can boot normally in direct boot mode, but in the third reboot, the core module enters recovery boot mode. recovery_boot_log.txt (231.8 KB)
Thank you for the guidance. Based on the information provided, we have determined the following:
1.The Jetpack version is JP6.1.
2.The nv-l4t-bootloader-config.service file can be found in the /opt/nvidia/l4t-bootloader-config directory, but the service remains in an inactive state by default.
3.We attempted to start the service using the command sudo systemctl start nv-l4t-bootloader-config.service, but after execution, the service still remains in an inactive state. The service status log is as follows:
root@sinovatio-desktop:/data# sudo systemctl status nv-l4t-bootloader-config.service >/data/test111
root@sinovatio-desktop:/data# cat test111
鈼?nv-l4t-bootloader-config.service - Configure bootloader service
Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Wed 1969-12-31 19:06:35 EST; 1min 20s ago
Process: 2619 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)
Main PID: 2619 (code=exited, status=0/SUCCESS)
CPU: 431ms
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2886]: COMPATIBLE_SPEC 3767–0003–1–jetson-orin-nano-devkit-
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2886]: TEGRA_BOOT_STORAGE nvme0n1
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2886]: TEGRA_CHIPID 0x23
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2886]: TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2886]: TEGRA_OTA_GPT_DEVICE /dev/mtdblock0
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2619]: Info: Write TegraPlatformCompatSpec with 3767–0003–1–jetson-orin-nano-devkit-.
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2619]: Info. Verifying boot status.
Dec 31 19:06:35 sinovatio-desktop nv-l4t-bootloader-config.sh[2891]: Info: variable BootChainFwStatus is not found.
Dec 31 19:06:35 sinovatio-desktop systemd[1]: nv-l4t-bootloader-config.service: Deactivated successfully.
Dec 31 19:06:35 sinovatio-desktop systemd[1]: Finished Configure bootloader service.
root@sinovatio-desktop:/data#
Is this normal? What operations or environment settings might be causing the issue?
Here are some additional findings:
1.We can confirm that the faulty device fails after rebooting 3 times, while powering on and off 10 times at the same frequency does not result in a failure, which is consistent with the case you provided.
2.Our device can be roughly divided into three parts: the Nano core board, the base board, and the solid-state drive (SSD) as the only boot device. It has been confirmed that the fault is related to a specific SSD and is not related to any Nano core board or base board.
3.It has been confirmed that both the normal SSD and the faulty SSD have the nv-l4t-bootloader-config.service file in the /opt/nvidia/l4t-bootloader-config directory, and the MD5 checksum results are the same.
4.The logs in the previous response are from a normal SSD. In the environment with a normal SSD, starting the service with sudo systemctl start nv-l4t-bootloader-config.service completes in about 1 second.
5.In the environment with the faulty SSD, starting the service with sudo systemctl start nv-l4t-bootloader-config.service gets stuck indefinitely, and it can only be exited using Ctrl+C. The logs in this environment are as follows:
Additional Findings
We found that the nv-l4t-bootloader-config.service on the faulty solid-state drive gets stuck at its dependent nvgetty.service. After manual testing, we discovered that the two dependent services of nvgetty.service, nv.service and nvpmodel.service, can both start normally, and the /etc/systemd/nvgetty.sh script can also run normally.
Can you help us confirm what is preventing the nvgetty.service from starting?
Additional Findings
In the environment with the faulty solid-state drive, after the kernel has completed booting, the device cannot accept input via the serial port, but can be logged into via Telnet and used normally.
It has been confirmed that the console is using the device /dev/ttyTCU0. Using echo 123 > /dev/ttyTCU0 in the Telnet window shows that the serial port can output “123” normally.
The above issues are highly correlated with the failure to start nvgetty.service. Could it be that there is an error in my console configuration?
Sorry, I put this issue on hold due to other work for about a day, and now the fault cannot be reproduced (I confirmed that the same faulty SSD was used).
During this fault hold period, I made some minor changes, but I cannot recall all the details. Therefore, the analysis of this issue can only be temporarily halted.
Thank you very much for your guidance. If the fault reoccurs, I will investigate following the nv-l4t-bootloader-config.service → nvgetty.service → /var/log/syslog path. If there are any new developments, I will create a new topic and attach the link to this topic.