Nvidia AGX Xavier boot up problem

prathvi00001 · July 7, 2021, 6:57am

Xavier board was working perfectly fine(SDK and everything was present).

Once I did a forced shutdown, now the board is not booting up.

How can I check at the issue which is causing this.

WayneWWW · July 7, 2021, 7:51am

Dump the log from serial console.

prathvi00001 · July 7, 2021, 9:01am

Have attached the log
JetsonAGX_log (39.4 KB)

WayneWWW · July 7, 2021, 9:03am

It looks like the file system is getting corrupted so that bootloader is not able to read the kernel from it.

[0006.407] I> ########## Fixed storage boot ##########
[0006.412] I> Already published: 00010003
[0006.416] I> Look for boot partition
[0006.419] I> Fallback: assuming 0th partition is boot partition
[0006.425] I> Detect filesystem
[0006.452] I> Loading extlinux.conf …
[0006.452] I> rootfs path: /sdmmc_user/boot/extlinux/extlinux.conf
[0006.489] I> L4T boot options
[0006.489] I> [1]: “primary kernel”
[0006.490] I> Enter choice:
[0009.491] I> Continuing with default option: 1
[0009.491] I> Loading kernel sig file from rootfs …
[0009.491] I> rootfs path: /sdmmc_user/boot/Image.sig
[0009.510] I> Loading kernel binary from rootfs …
**[0009.510] I> rootfs path: /sdmmc_user/boot/Image **
[0015.746] I> lookup_linear_dir:441: Invalid file block num
[0015.746] I> ext2_walk:142: ‘Image’ lookup failed
[0015.747] I> ext4_open_file:647: ‘/boot/Image’ lookup failed
[0015.747] E> file /sdmmc_user/boot/Image open failed!!
[0015.747] W> Failed to load kernel binary from rootfs (err=20

prathvi00001 · July 7, 2021, 9:06am

What should be the approach for this issue?

WayneWWW · July 7, 2021, 9:07am

The only way now is to reflash your device.

prathvi00001 · July 7, 2021, 9:14am

Can I know what causes this issue? so that can take care about it in future.

WayneWWW · July 7, 2021, 9:18am

I don’t know either. Since you said you did a “forced shutdown”, I can only guess that corrupted the file system.
Though I actually not saw much of such case before.

If you have more detail about what you’ve tried and able to reproduce this easily, then we can investigate.

prathvi00001 · July 7, 2021, 9:21am

Was having issue in connecting internet through USB, thus had done force shutdown.

WayneWWW · July 7, 2021, 9:23am

I would suggest you can try the same thing after you re-flash the board and see if this error would happen again.

prathvi00001 · July 7, 2021, 9:25am

Sure, Thankyou

prathvi00001 · July 7, 2021, 9:28am

Will upgrading bootloader fix this issue? since reflashing the device might take time.

WayneWWW · July 7, 2021, 9:36am

No, the broken part is in kernel. Bootloader update does not update it.

Also, actually not only the kernel is broken. We have a redundancy mechanism, when kernel in the file system is broken, it will fallback to kernel in the partition. Your kernel is fine in that partition.

But the ramdisk in the file system is broken too. And this one has no backup. If you just want a quick fix, you can try to use flash.sh with -I parameters and flash your initrd. Though I don’t guarentee it will work.

    -I <initrd> ---------- initrd file. Null initrd is default.

[0016.655] I> rootfs path: /sdmmc_user/boot/initrd
[0022.869] I> lookup_linear_dir:441: Invalid file block num
[0022.869] I> ext2_walk:142: ‘initrd’ lookup failed
[0022.870] I> ext4_open_file:647: ‘/boot/initrd’ lookup failed
[0022.870] E> file /sdmmc_user/boot/initrd open failed!!
[0022.871] E> kernel boot failed

prathvi00001 · July 7, 2021, 9:43am

Thankyou!

linuxdev · July 7, 2021, 5:03pm

Note that a journal type filesystem keeps a record of what is written but not flushed, and that if you suddenly lose power or have a crash, then the journal will back out the changes which were not yet flushed. You’d lose all content which was in the middle of write at the time of loss or lockup, but the system would not be corrupt. Unfortunately, the journal has a fixed size, and if more content than it keeps a record of is being written, actual corruption occurs.

If the boot is unable to use a filesystem, then probably something invalid was actually written directly to the disk (e.g., formatting a disk of a running system would destroy it), or else there was a large amount of unflushed data at the time of failure.

In the case of no “corruption”, but “loss of data”, it implies that content was in the middle of write at the time of failure. In that case those files/directories would be missing or incomplete.

In the case of a boot failure when unable to load the Image file, if the root filesystem type is not one which the boot software understands, then this would be a failure. If the Image file was being updated at the time of a power loss, then it is possible that either the old kernel would still be in place, or that part of the new kernel would be lost (and thus corrupt, but still present). If enough of the kernel was being written at the time of failure, and the journal sees this, then the entire file might be erased during the journal recovery (versus just part of the file or versus rolling back to an old version).

If you have a need for emergency shutdown when something is locking you out there is a recommended way to shut this down instead of cutting power if you have a keyboard attached. The magic sysrq can sometimes be used to first call sync (preferably twice), then the filesystem remounted read-only, followed by either cutting power or being told to reboot. Sysrq usually survives even when the rest of the system is locked up or otherwise failing.

If you have a keyboard attached, then you might try this once just to see how it works:

ALT-SYSRQ-s # Calling sync twice. Watch “dmesg --follow” ahead of time if curious.
ALT-SYSRQ-s
ALT-SYSRQ-u # Calling for the filesystems to be remounted read-only.
ALT-SYSRQ-b # Calling for forced reboot.

If you have a working serial console you cannot use key bindings for this since it would go to the host PC instead, but you can use an “echo” of the correct character, and redirected to “/proc/sysrq-trigger”. Example, from serial console:
sudo echo 's' > /proc/sysrq-trigger # Would call sync.

Jetsons are full computers, not little embedded devices without cache or buffer. Treat them as if they are full computers. If you wouldn’t turn your host PC off by yanking the power cord from the wall, then don’t do this with a Jetson…it might be tiny, but it is a full system and would suffer the same as a desktop PC. Obviously when you hit bugs or crashes there isn’t much you can do about it, but if you have magic-sysrq available, then it is much safer than just pulling power.

Incidentally, there is a mask used to determine how much of magic-sysrq is exposed to the user. Not all architectures allow all functions, but basically a mask of “1” enables everything the architecture supports. See https://www.kernel.org/doc/Documentation/sysrq.txt, and examine “kernel.sysrq” in “/etc/sysctl.conf”. To see the actual sysrq mask currently being honored run this command:
cat /proc/sys/kernel/sysrq

It is possible that someone setting up a commercial release would want to disable part of the sysrq, but would suggest keeping at least the ability to shut down cleanly without yanking power.

system · September 12, 2021, 3:54am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson AGX Xavier freezing intermittently Jetson AGX Xavier boot	9	734	January 19, 2022
Jetson AGX Xavier with no space can't boot Jetson AGX Xavier boot	4	355	August 22, 2023
Mount failure (maybe after power cycle) Jetson Xavier NX linux	9	1023	October 18, 2021
Jetson AGX Xavier USB not recognized Jetson AGX Xavier boot , usb	22	3289	March 3, 2022
Jetson AGX Xavier not displaying Jetson AGX Xavier boot	16	2459	April 23, 2022
Jetson AGX Xavier Intermittent Booting Issue Jetson AGX Xavier boot	4	442	August 1, 2023
Blue screen after boot Jetson AGX Xavier boot , kernel	15	1451	October 18, 2021
AGX bootloop after powercut during update process, attempts to reflash unsuccessful Jetson AGX Xavier reflash	10	772	October 18, 2021
Problem boot Xavier-nx not found root file system after long time using device Jetson Xavier NX boot	17	46	September 19, 2024
Jetson Xavier AGX - Persistent 'testkey is used' Warning Message and Booting Issue Jetson AGX Xavier boot	26	981	June 21, 2023

Nvidia AGX Xavier boot up problem

Related topics