After 4119 reboots Jetson Nano Orin cannot start

I am using SSD Transcend 1TB, M.2 2242,PCIe Gen3x4, Nvme, 3D Tlc, Dram-less

This is super important, since we will deploy in many places, how to fix that?

We must have a solid version that allow us to trust in NVIDIA.

After run a test every 4 minutes reboot, we get this issue. I am using custom board.

nvidia_crash.txt (99.8 KB)

0:53:07.587 → [ 8.220345] Root device found: initrd
10:53:07.587 → modprobe: FATAL: Module r8168 not found in directory /lib/modules/5.10.192-tegra

10:53:07.587 → [ 8.222973] Mount initrd as rootfs and enter recovery mode
10:53:07.653 → Finding OTA work dir on external storage devices

10:53:07.653 → Checking whether device /dev/mmcblk?p1 exist

10:53:07.653 → Device /dev/mmcblk?p1 does not exist

10:53:07.653 → Checking whether device /dev/sd?1 exist

10:53:07.653 → Device /dev/sd?1 does not exist

10:53:07.653 → Checking whether device /dev/nvme?n1p1 exist

10:53:07.653 → Looking for OTA work directory on the device(s): /dev/nvme0n1p1

10:53:07.653 → mount /dev/nvme0n1p1 /mnt

10:53:07.653 → [ 8.260493] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
10:53:07.686 → is_boot_only_partition /mnt

10:53:07.686 → OTA work directory /mnt/ota_work is not found on /dev/nvme0n1p1

10:53:07.686 → Finding OTA work dir on internal storage device

10:53:07.686 → mount /dev/mmcblk0p1 /mnt

10:53:07.686 → mount: /mnt: special device /dev/mmcblk0p1 does not exist.

10:53:07.686 → Failed to mount /dev/mmcblk0p1 on the /mnt

10:53:07.686 → Failed to run “mount_ota_work_partition /dev/mmcblk0p1 /mnt”

10:53:07.686 → OTA work directory is not found on internal and external storage devices

10:53:07.686 → bash: cannot set terminal process group (-1): Inappropriate ioctl for device

10:53:07.719 → bash: no job control in this shell

10:53:07.719 → bash-5.0# ⸮

BTW we have 5 samples at this point, and this is happened before many times, sometimes 400 boots, sometimes 800 and sometimes 4k.

Your log indicates this is stuck in “recovery boot”. This state means there was a consecutive 3 times of boot failure before it.

It is pointless to check recovery boot log because it is a common mechanism provided by NVIDIA.

What you need to do is share the log before this recover boot so that we can tell why there is 3 consecutive falure.

Guys, we can do it, But this never recovery anymore. The point here is after reboots in test process we have Nano stuck and the only way to recovery it, is flashing Nano again.

This is a know issue for Nvidia, after sudo reboot UEFI problems, this is easy to reproduce in release 5.1.3, and here in this forum many people complain, this is not special event, we can reproduce many days just with sudo reboot.

This is not a rare event, We need support for NVIDIA and we must make sure this is properly tested and solid like a rock. What is the next step here?

You can recover from recovery boot by using this method.

Great, I need to take a Fly 20 hours far from the office and then I can recovery the Jetson nano, Customers also can call me to ask about the system that at time doesn’t work.

We need a solution that we can deploy without crash. Any new version that we need to try?

I am seeing a similar issue, and I suspect this is due to UEFI misconfiguration. I have noticed that sometimes after a reboot the UEFI bootloader settings are randomly configured (possibly due electrical noise or misconfigured console input) and this can lead to boot device misconfiguration.

I am working on disabling the UEFI screen entirely, using the method mentioned here. Hope this helps!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.