Jetson TX2 boot process is stuck on the NVIDIA logo screen

orong13 · June 25, 2020, 6:40pm

Hello,
I have two Jetson TX2 developer kits as described here:
Jetson TX2 developer kit

At the beginning I used the JetPack 3.2.1 when it was the latest version provided by NVIDIA.
Above of it I installed several additional third parties such as:
OpenCV 3.4.4,
PyTorch 1.1.0,
etc…
All of them I built from sources.

In order to backup my Jetson TX2 I learned from the following link how to clone and restore the eMMC partitions:
Jetson clone and restore eMMC partitions

I successfully clone and restore the eMMC partitions when I continue to use the same Jetson.

After a while, I started to work with the second Jetson with a newer JetPack versions such as 4.3 & 4.4.
I successfully backup them also using the same clone and restore processes.

Problem description:
When I wanted to restore the older JetPack that was successfully restored on the first Jetson, but now on the second Jetson, I couldn’t succeed.

The restore operation was successfully done but after reset the jetson its boot process was stuck on the NVIDIA logo screen.
When I restored the newer jetpack versions on the same jetson (the second one) their boot process were successful completed and I could work with them.

After connecting UART#0 to an external sniffer I successfully got the boot messages for all jetpack versions.
I did it in order to be able to analyze what happened in the boot process and why it stuch for the older one and work OK for the newer.

In order to compare between a wrong boot process and good bot process I attached the following reports:

JetsonTX2BootLoaderErrorReports.txt (20.9 KB) - Older JetPack (the stuck) boot process messages
JetsonTX2BootLoaderSuccess4_4Reports.txt (21.1 KB) - JetPack 4.4 that successfully boot
JetPackRestore3.2.1.txt (1.9 MB) - Older JetPack restore operation messages
JetPack4.4RestoreImage.txt (1.9 MB) - JetPack 4.4 restore operation messages

Any help or advise why the two JetPack boot processes results are different will be much appreciated.
Thanks,

linuxdev · June 26, 2020, 5:00pm

Are you mixing an older rootfs image with newer boot code? This will fail. There are a lot of boot-related partitions which set up prior to the rootfs and kernel ever loading, and these tend to need to match in release version. If you keep a safe copy of the raw image (not the sparse image), then there are some experiments you could try, but there is still a high chance of failure when mixing an older rootfs with a newer boot. If you have a specific location you want to recreate, e.g., “/home”, then this is quite easy to fix (since “/home” does not contain system boot files). What part do you need to save? Do you have the raw image, or just the smaller sparse image?

A second issue might be if your cloned rootfs size differs from the default flash size. Do you have the raw clone images, or just the sparse ones? The raw image is the actual size of the partition, and if that size is evenly divisible twice by 1024, then that is the size in “MiB”; if divisible evenly three times by 1024, then that is the size in “GiB”. Size can be specified with the “-S size” option. In a case where size is different from current reserved space, then specifying an exact size will avoid truncation. Example:

“system.img” is 30064771072 bytes.
30064771072 can be divided three times by 1024:
28GiB.
Option:
-S 28GiB
sudo ./flash.sh -S 28GiB jetson-tx2 mmcblk0p1

orong13 · June 26, 2020, 5:44pm

Hello,
Both images are raw clone images generated by the following command:
sudo ./flash.sh -r -k APP jetson-tx2 mmcblk0p1
For each image the command was activated after preparing the relevant BSP driver package and sample root file system.
Both raw images size are as the eMMC size - ~32GB.

The BSP and sample root file system versions were:

JetPack 3.2.1:
L4T 28.2.1 Jetson TX2 Driver Package
L4T 28.2.1 Sample Root Filesystem Source
JetPack 4.4:
L4T Jetson AGX Xavier, Xavier NX, and TX2 R32.4.2 Jetson Driver Package
L4T R32.4.2 Sample Root Filesystem Sources

The #1 JetPack raw image was prepared for an older jetson TX2 development kit that I had and the #2 JetPack raw image was prepared for the newer Jetson that I have now.

When I’m trying to restore the #2 JetPack raw image on the new jetson development kit it works OK and boot process completed but when I’m trying to restore the #1 JetPack raw image on it it stuck.

When I had the older Jetson development kit, the same that I used to clone the #1 JetPack raw image, the restore operation of the # 1 JetPack was OK and its boot process was completed OK,

Questions:

As I understand your words is it true to say that two Jetson TX2 development kits can have a different boot version?
How can i know what is the boot version?
What is the sparse image?
Suppose that I have no access to the older Jetson development kit, is there any way to manually interrupt the boot process and to change “something” in order to be able to continue till boot completion?

linuxdev · June 26, 2020, 6:22pm

I’m not sure if JetPack 3.2.1 (L4T R28.x) uses the same clone command as R32.x (JetPack 4.x); cloning evolved slightly different command lines during R32.x release. If you cloned an R28.x rootfs using an R32.x flash.sh, there should not be any issue; cloning a partition shouldn’t care which release the clone is from (the “driver package” is what provides flash.sh and does not care about what flashed the Jetson if it is only reading partitions). It is the restoring of a clone partition where release matters since non-rootfs partitions and rootfs have dependencies.

If I read this correctly, then “#1” implies R28.x. If you restored an R28.x clone using the same R28.x driver package, then it should work, especially if you used a full flash which includes not only the rootfs, but also the other partitions. The goal is to guarantee all of the non-rootfs partitions are of the same version as the clone, and restoring with just the “-r” option implies all of the other content is freshly installed. Old content in non-rootfs partitions could be incorrect.

In the R28.x series the way to know what release the rootfs is from is “head -n 1 /etc/nv_tegra_release” (and if this is loopback mounted on the host PC just adjust for where the mount location is). Some time in R32.x (not the first release) dpkg started being used and /etc/nv_tegra_release was no longer used (that file goes away in releases using dpkg/apt for NVIDIA-specific content).

You are correct that the boot content varies drastically between older and newer L4T releases. I do not know how to find out what the release is for the non-rootfs part of boot. This is why I suggest doing a full flash where only the “reuse rootfs” option (“sudo ./flash.sh -r jetson-tx2 mmcblk0p1” after placing the clone as “bootloader/system.img”) is used to make certain the rootfs itself is from the clone…the other content is guaranteed to be overwritten with a consistent and correct release from that driver package (and thus you won’t need to examine the non-rootfs content version).

Note: Much boot content change was driven by adding support for Trusty/boot content signing and failover partitions.

Originally all of the Tegra flash software created a full Linux partition via loopback on the host PC during flash. This image is a bit-for-bit exact copy of what the flashed partition will be, and this is a rather large file on the host PC. This is “raw”. However, this takes a long time to copy so many gigabytes of data over a USB2 cable, and so a form of ext4 filesystem storage, known as “sparse”, stores only the parts of the ext4 filesystem which have content…the empty space is restored later with an algorithm instead of copying bytes. Think of this as the “poor man’s” compression for ext4 images. If an ext4 raw image has almost no content, then the sparse image will be almost zero size. If an ext4 raw image is filled with content, then the sparse image approaches the size of the raw image. The Jetson itself is able to flash using raw or sparse images, they are interchangeable.

I have not found any open source tools which work with this version of sparse image. The “mksparse” tool is used to create the sparse image, but converting sparse back to raw is in the realm of the recovery mode Jetson. Once you have a sparse version to work with it is written in stone, whereas a raw version allows using loopback.

The size of the raw image is always the exact size of the rootfs partition in the final eMMC content after flash.

I’m not sure if this is what you mean by “not available”, but the older driver package and JetPack should be available. If not, then someone will probably check why not, and get it back to the web site.

Any system using U-Boot should have an ability to interrupt prior to loading the kernel (via serial console), but this does not mean you’ll be able to make minor adjustments to get R28.x and R32.x to live with each other. The part of U-Boot environment you can easily change are the environment variables. An example of what would be easy to change there is the order of detecting boot devices or perhaps timeouts. You would need to give much more detail about what you want to change before anyone could say if interrupting boot would help, but making R28.x and R32.x compatible is not one of those.

orong13 · June 26, 2020, 7:40pm

Hi,
I will try to clarify myself,
I’m not trying to mix between R28.x and R32.x at all.
Each one of them has a separate directory in my host Linux PC.

For each one of them I separately prepared the BSP driver package directory with the relevant root file system directory inside the rootfs directory.

I cloned both #1 (R28,x based image) and #2 (R32,x based image) separately with their dedicated flash.sh version from two different jetson development kits.
(I had two jetson development kits units each one with #1 & #2 jetpacks)
Now, I have only one jetson development kit - the one with the R32.x.
The older kit isn’t available for me anymore.
You are right about the driver package availability in the web…

When I’m trying to restore each one of raw images that I prepared, each one with its dedicated flash.sh that I used for their cloning, the restore operation reports on success for both raw images but when I’m trying to boot them only the R32,x based is completed OK, the R28,x based boot is stuck.
When i restored it (the R28.x based) to the older kit (The one that I used for its cloning) it was booted OK without any stuck.

.Again, the R28,x based has its original BSP driver package and its root file system for both clone and restore operation.

But I’m using the same command to restore the image for both versions:
sudo ./flash.sh -r -k APP jetson-tx2 mmcblk0p1

I will try to activate it using the -r option only.

Thanks for the clarification about the sparse image.

I wanted to ask please, Did you read the boot reports?
Maybe you fill find there any error report that will help to understand why the boot process stuck for the R28,x.
You can see that for the R28.x boot process print “Starting kernel …”
and then stuck…and for the R32.x you can see that there are more prints after this line.

Thanks,

Andrey1984 · June 26, 2020, 8:11pm

you may try to flas with sdkmanager to 28* or 3* release and then flash corresponding image that will match the release;

orong13 · June 26, 2020, 8:50pm

Thanks,
I tried to do that but I saw that the sdkManager oldest supported version is newer than jetPack 3.2.1.

I will try to flash my new jetson kit with the old R28,x (jetPack 3.2.1) using the old interface that I used to work with before the sdkManager was released:
Download and Install JetPack

linuxdev · June 27, 2020, 7:29pm

I tend to use the command line driver package plus sample rootfs if I am just flashing. If you have a clone you don’t even need the sample rootfs. You can look at these URLs and find a specific L4T release to use when you know of a specific JetPack release (you’ll have to go there, log in, and then click the link again):
https://developer.nvidia.com/embedded/linux-tegra-archive
https://developer.nvidia.com/embedded/jetpack-archive

You only need JetPack in order to install extra optional packages after flash completes. You may not even need it for that if your clone already had your options installed. Looks like if you reuse your image of the system which was originally flashed with JetPack3.2.1, then you could just use the L4T R28.2.1 driver package directly (and it sounds like you probably have this, but it wouldn’t hurt to download and unpack the driver package by itself somewhere else and try flashing on command line with this after putting your clone in place).

However it is done, the goal is to flash all of the non-rootfs content via a fresh package install of a known compatible L4T release, along with the clone which was originally from that release. If this fails, then you might provide the log for the flash, along with the serial console boot log. A command line flash with logging would be:
sudo ./flash.sh -r jetson-tx2 mmcblk0p1 2>&1 | tee log_flash.txt

orong13 · June 28, 2020, 4:17am

Hello,
I wanted to update that based on your recommendations I did the following:

Restore R32.x raw image with the following command:
sudo ./flash.sh -r -k APP jetson-tx2 mmcblk0p1
I checked that boot process was successfully done
Restore R28.x raw image with the following command:
sudo ./flash.sh -r jetson-tx2 mmcblk0p1

Result:
Boot process was successfully done!!!

Here is the time to emphasize that on my current jetson Kit the first jetPack that I flashed via the sdkManager was R32.x based. This was the first time ever I tried to restore my R28.x raw image that was cloned from an older jetson Kit that I had.

I’m now want to fully understand what are the differences between usage of -r only and with -k APP addition.

I tried to learn from here:
Flashing and Booting the Target Device

I’m not sure that I totally understand.

Can you help me here please to close that issue.

Thanks!

linuxdev · June 28, 2020, 10:08pm

I would normally discourage use of the “-k APP” unless you know the content in the other partitions is an exact match to what is required for that rootfs. Many times people overestimate the extra time it takes to flash the non-rootfs partitions…it is tiny. The use of just the “-r” option is more likely to succeed since other content will be updated and you will know it is for the release you just used.

If you flash with just “-r”, then it is a full flash, but instead of generating the rootfs partition (the “system.img”), it will use the content you already have there.

If you flash with “-r -k APP”, then expect you’ll get the rootfs partition you are flashing, but none of the surrounding content. Your surrounding content might be the same as what you’d get from a full flash, but not necessarily. The “APP” partition is the rootfs, and you’ve just specified only APP.

I’ve never tried it, but I think if you were to specify “-k APP”, but not “-r”, then a new default rootfs would be flashed without any other content being flashed.

Keep in mind that all of those other partitions other than rootfs are used for boot and are tiny. Embedded systems don’t have any kind of BIOS/UEFI, and so flashing that content is somewhat equivalent to flashing both a bootloader and a BIOS (setting up power rails, clocks, training memory, so on).

orong13 · July 3, 2020, 6:42am

Hello,
Thank you very much for your help and explanations!
Regard,