We have about 15 Jetson TX2. We needed to to install some packages and do some configuration on all of them. Doing so on every Jetson takes a lot of time, so we thought to backup one Jetson that has everything and flash the image on the rest of them.
So here what we did:
- Flash one Jetson (say Jetson1) using JetPack 3.0, install and configure everything on it.
- On linux host machine, connect the Jetson1 in recovery mode and take a backup using L4T with the following command:
sudo ./flash.sh -r -k APP -G system.img jetson-tx2 mmcblk0p1
- Move system.img and system.img.raw to L4T/bootloaders folder.
- With some research, I knew that L4T can only backup one partition at a time, so we thought to flash new Jetsons (say Jetson2) using JetPack 3.0 first to fill all other partitions with "Flash OS Image to Target" option only active.
- Restart the Jetson2 and connect it in recovery mode again then flash it using same L4T with the following command:
sudo ./flash.sh -r -k APP jetson-tx2 mccblk0p1
So here is the problem, this method worked with 10 of the Jetsons while fails the other 5 Jetsons.
The ones that didn’t work shows the following message while booting: “The system is running in low-graphics mode”. And I’m not able to fix this till now.
Please note that I use the same process/equipment with all of them. I can’t figure out why it works with some of them and doesn’t with the others!
I hope you can help me figure out this problem or let me know if there is a better/easier way to accomplish the same.
Thanks for your help in advance,
Is there any historic difference between the failed and working clone installs as to which L4T version which was on the Jetson just prior to the restore? There are many hidden partitions, and if one of those partitions is from a different version of L4T, or if one of those others had different install options, then I could see this happening.
As an alternate view, you could clone all of the partitions on one Jetson which works, and attempt to clone restore each of the hidden partitions one at a time into the failing system which already has the rootfs clone restored. See if one of those partitions fixes it.
A note about partitions and cloning: There is still some “bookkeeping” type information at the start of the eMMC which is not part of the hidden partitions, nor part of the rootfs, but which is needed for the system to identify where things are. If this boot record information were from a system cloned with a different partition size or layout versus what was cloned in as a restore, then it would probably imply something would fail when offsets do not match metadata. So consider that if anything were historically different not only in partition content, but also in partition size (especially rootfs since it is the first partition after the metadata), this would break boot.
On a TX2 I have here with R28.1 (using sudo) I can see output from “gdisk -l /dev/mmcblk0” as this:
GPT fdisk (gdisk) version 1.0.1
Partition table scan:
BSD: not present
APM: not present
Found valid GPT with protective MBR; using GPT.
Disk /dev/mmcblk0: 61071360 sectors, 29.1 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 00000000-0000-0000-0000-000000000000
Partition table holds up to 17 entries
First usable sector is 4097, last usable sector is 61071327
Partitions will be aligned on 1-sector boundaries
Total free space is 1 sectors (512 bytes)
Number Start (sector) End (sector) Size Code Name
1 4097 41947136 20.0 GiB 0700 APP
2 41947137 41955328 4.0 MiB 0700 mts-bootpack
3 41955329 41955840 256.0 KiB 0700 cpu-bootloader
4 41955841 41956864 512.0 KiB 0700 bootloader-dtb
5 41956865 41963008 3.0 MiB 0700 secure-os
6 41963009 41963012 2.0 KiB 0700 eks
7 41963013 41964220 604.0 KiB 0700 bpmp-fw
8 41964221 41965220 500.0 KiB 0700 bpmp-fw-dtb
9 41965221 41969316 2.0 MiB 0700 sce-fw
10 41969317 41981604 6.0 MiB 0700 sc7
11 41981605 41985700 2.0 MiB 0700 FBNAME
12 41985701 42247844 128.0 MiB 0700 BMP
13 42247845 42313380 32.0 MiB 0700 SOS
14 42313381 42444452 64.0 MiB 0700 kernel
15 42444453 42445476 512.0 KiB 0700 kernel-dtb
16 42445477 42969764 256.0 MiB 0700 CAC
17 42969765 61071326 8.6 GiB 0700 UDA
APP is the rootfs and is the first partition. It begins at byte 4097, which is a 4096 byte offset or one sector offset when sector size is 4096 bytes. An older BIOS style partition would reserve 512 bytes for MBR, plus backup MBR…UEFI moves some of the firmware into the start of the disk and out of the BIOS. If this were a working and bootable system you could use dd to write into the initial disk metadata using a copy of the metadata from dd reading the first 4096 bytes of a working system, but since you can’t boot you can’t use dd for write. I don’t know if the clone software is capable of cloning this unlabeled raw byte offset (you could on the TK1, but it has different flasher options).
Can anyone from NVIDIA suggest if the R28.1 driver package clone can copy via exact byte offset (perhaps with a patch)? Is it mandatory to clone only by partition label? Is there a way to clone offset byte zero through byte 4096, and then to write this by offset into one of the failed units? This would provide a true backup and restore mechanism even on customized installs.
Thanks linuxdev for your reply.
There is no historic difference between both, in the same session I was able to flash one Jetson and works and another that doesn’t. Reflashing both gives the same results.
Cloning and restoring each partition separately is lengthy but I can try it. Is it the same process? For example for “mts-bootpack” partition, I need to “sudo ./flash.sh -r -k mts-bootpack -G system.img jetson-tx2 mmcblk0p1”, move it to bootloaders folder then “sudo ./flash.sh -r -k mts-bootpack jetson-tx2 mccblk0p1”?
I checked both working and failing Jetsons and the partition sizes and layout are identical. You can check the result I got here as it bit different than yours: https://imgur.com/a/ULVby
The weird problem is that I flash a new Jetson using Jetpack with “Flash OS Image to Target” option only and it boots normally. Then I flash the APP partition with the image I took from the other Jetson using “sudo ./flash.sh -r -k APP jetson-tx2 mccblk0p1” and that when I get “The system is running in low-graphics mode” message!
So far as I know cloning is the same for every partition other than having different partition names. Notice that you can list partitions on a Jetson with “sudo gdisk -l /dev/mmcblk0”. One is “APP”, which is the rootfs, and this is why the clone or restore would name “APP”. “mts-bootpack” should be valid for that partition.
One of the things I wanted to emphasize about “being the same” when flashed is that they also be the same rootfs partition size, not just the same version. But given this I would think all Jetsons should function with clones of rootfs. I suppose there is a possibility of a board revision being an issue of requiring some change, but I don’t know of any specific example of this.
I do hope we can find a way to clone via byte offset as well since partition names do not allow clone of the entire eMMC…the first 4096 bytes (one sector) really needs to be cloned and written too if complete control over partition content is to be available during production runs based on clone of a reference unit.
I tried cloning the other partitions, a lot of them are not supported and it seems the process for cloning other partition is not the same as with APP partition.
The partitions are exactly the same sizes and layout. I also thought it might be the board revision but what I have tried today didn’t make any sense to me. I flashed the original Jetson with the same image I took from it and it did show the same problem of “The system is running in low-graphics mode”!
I don’t know what would be the problem, again the way I do it as following:
- Backup APP partition
- Use Jetpack on new Jetson to flash the OS (to fill other paritions)
- Restore APP partition on the new Jetson
Note that I restored the image on the same Jetson I did the backup from using the same Jetpack version I originally flashed it.
Can anyone tell me if I’m doing something wrong? and if there is different way to achieve the same?
we had tried to reproduce your issue but not able to meet the same failure so far.
according to your comments,
>> this method worked with 10 of the Jetsons while fails the other 5 Jetsons.
may i know had you meet this issue consistently on the same 5 Jetson boards?
how about flashing again? did you still bump into this failure?
I’d just like to add that if we can get a modified flash.sh which allows cloning and restore of raw byte offsets (or raw sector numbers) there would be a lot more we could do in terms of backup/restore/production/testing.
Thanks for your reply. I tried reflashing multiple times and the same problem of “The system is running in low-graphics mode” still exists. The weird thing is that everything got flashed correctly and all files are there, we can still can build and ssh on it. Don’t know what is the problem with xserver so. And why is it with some units and not with the others!
Does “sha1sum -c /etc/nv_tegra_release” show all files valid after the clone restore?
It seems not all files are valid, here you the result I got:
sha1sum: WARNING: 6 computed checksums did NOT match
What does that mean? I’m not able to think about a reason for that!
It appears something has updated the system such that NVIDIA-specific versions have been replaced with another version. This will cause all kinds of failures, and I’d be surprised if anything using libglx.so has any kind of success with a monitor at all.
In the driver package the “sudo ./apply_binaries.sh” step installs these files. You can run this with the “-r /some/where/else” option to apply the binaries to a different directory than the rootfs subdirectory, and you can also put the “nv_tegra/nvidia_drivers.tbz2” file in the “/” directory of the Jetson and then do this to put them back in place (this isn’t all files, just the core drivers):
sudo tar xvfj --overwrite nvidia_drivers.tbz2
Watch for failures while extracting the drivers, but after this the sha1sum should be ok. However, this does not account for why any of the systems would work at all…I’m curious, do any of the working systems pass the sha1sum test?
The working units have sha1sum passing. I copied nvidia_drivers.tbz2 to the failing Jetson and extracted it into / directory then sha1sum was passing. But after restarting and try to connect it to a monitor, it showed the booting sequence then mouse pointer appears briefly and I get a black screen after that (No signal to the monitor). I still can ssh to the Jetson.
The part which really sticks out is that if you have cloned from a Jetson with a passing sha1sum, then everything receiving the clone should also pass. Does the sha1sum pass on the Jetson which was the source of the clone? If so, then either the clone was bad, or the restore.
My concern would be that if something modified those files, then other parts of the system were probably also modified…and without knowing exactly what happened you can’t trust that those other parts will work interchangeably with the “corrected” unpack of files.
That said, I would recommend watching what happens via serial console as the system boots. This could provide a very good insight into what remains failing with far less effort than figuring it out one step at a time (serial console might just tell you directly what’s failing…ssh only shows after networking is up…though you can dig through “dmesg” and “/var/log/Xorg.0.log” and the answer might be there).