Can we use Jetson AGX Xavier as host pc for flashing?

nagesh_accord · May 15, 2024, 7:25pm

linuxdev:

That command is for setting up the root filesystem during a normal flash. It is likely that the backup and restore scripts do not use that content, but I won’t guarantee that it doesn’t do any housekeeping. During backup this would have nothing to do with it should backup halt. Possibly, but unlikely this would have an effect on restore. Even so, I would recommend doing this if you are using that software via manual install to the PC.

The partition binaries for eMMC models (and QSPI content) are part of that release’s flash. When you flash normally, or when you flash normally other than reusing a given rootfs image, then all of that other binary content is part of the flash. If you purposely flash just the rootfs (which goes beyond “reusing the rootfs image”), then only the rootfs is updated.

Backup and restore can work with other parts of the binary content, and is more extensive than just backup and restore of the rootfs. However, it is the rootfs which contains the operating system and data. Everything else is mostly booting. A serial console boot log is how you differentiate which stage has a failure (and remember that booting sets up an environment for the rootfs to run in…one can have a correct boot, but the rootfs might fail if the environment is from the wrong release).

The boot content in the locations outside of the rootfs mostly works with everything in a given major release unless it has a bug fix. Consider that most of the L4T R35.x rootfs content will work with the binary boot content of other R35.x releases (though bug fixes might get in the way if content is too old); consider that L4T R35.x rootfs content is guaranteed to fail with all L4T R36.x rootfs. Should you install R35.4.1 flash software manually on your host PC, and if you have a rootfs preserved from a Jetson that had R35.4.1, then you don’t need to preserve that non-rootfs binary content. Exceptions might exist if you’ve customized it.

Consider this use-case: Your boot content is corrupt, but you have a proper rootfs image. If you flash all new content, except that you reuse rootfs, and the flash software is from the same release, then the corrupted boot content would be fixed.

Another use-case: Your boot content is not corrupt, and you have a proper rootfs image, but you are not restoring to the same Jetson. If you reuse the rootfs, but otherwise perform a complete flash using the same L4T release for flash software, then the new unit will get a matching and working boot content despite having arrived empty.

Another use-case: Your boot content is corrupt or the wrong release. You have a backup rootfs. You have a backup of all of the binary non-rootfs content. If you reuse the full non-rootfs content, all units flashed will fail. If you use a default flash, but otherwise it is set to reuse your backup rootfs image, then the corrupt content is replaced and it should work.

Complications and customization tend to matter when you use an initrd boot to adapt to external rootfs media. Some adaptation must be made for properly booting into the rootfs on external media. The default binary content is not for external boot.

Thanks for the elaborate information.

linuxdev · May 15, 2024, 11:39pm

Honestly, that is too much error to completely recover. If I were working on this, then what I’d do is clone in recovery mode, throw away the sparse clone, and then attempt to manually repair a loopback mounted raw clone copy on the host PC. You could save as much as possible from that and recreate it for install in any number of ways. The forum thread is getting longer, can you confirm what disk layout you have on this? For example, does it just boot to the internal eMMC, or is there any kind of external boot media involved?

nagesh_accord · May 16, 2024, 1:40am

This boots to the internal eMMC where RFS is mounted.

It has a external sd card and NVMe drive as well.

Note:
Also the unit is at customer site , where we don’t have access to the unit and we want to fix it without reflashing.

Even setting up host pc at customer site is also difficult …

linuxdev · May 16, 2024, 4:41pm

Are the NVMe and SD card simply mounted somewhere as auxiliary disks? Assuming it is the eMMC which must be recovered, about the only hope of doing this without flash is if it can boot to at least command line. One could then rsync to the SD card or NVMe if you don’t have anything there you care about, and attempt to fix the eMMC. I doubt this is practical though, and once again, I think all you would do is to save some of the disk but still have to flash again.

Correct cloning requires the device in recovery mode, and this in turn means the Jetson becomes a custom USB device and won’t have any access other than through your host PC.

I know that this is a dilemma with no good outcomes whether you have to go there and flash or whether you get the unit sent to you (e.g., overnight express mail), but without a booted unit (text mode is ok if networking is present, or if you have an external disk) there really is no chance.

nagesh_accord · May 17, 2024, 4:48am

How can fix this by doing rsync between external NVMe or sd card with internal emmc.
Please elaborate

nagesh_accord · May 17, 2024, 4:50am

If we press alt +F2 when the booting stops in the middle, it enters console mode.
After this what to be done to fix the booting issue.

linuxdev · May 17, 2024, 6:32pm

I’ll try to answer this last question first. If alt+F2 works, then the system did boot. It is just the graphics which failed. I’m assuming it didn’t drop into a bash shell directly, and it required you to log in. Did you have to log in for access after the alt+F2? If so, then that is a good thing.

As far as rsync goes, this is not necessarily a fix. It is a mere chance, and the fix might have flaws in it. Furthermore, it might be that you can recovery data like this, but not actually restore (I’ll say more on that below). How much empty space is left on the external devices? Does one of them have as much empty space as that which is used in the eMMC that you have issues on? Check the output of this for disk space information (this limits to partitions with ext4 formatting, and this is mandatory…you cannot back up like this to other filesystem types if they are not a native Linux filesystem type):
df -H -T -t ext4
(please show the output of this)

Corruption in the partition is a flaw in the tree structure of the data nodes. There is a kind of linked list structure whereby a chunk of information is memorized on the disk in one “node”; that node has addresses to parent and child nodes. By this method one can save a chunk, append it by changing pointers, so on. A file has a head node that is the directory. A file has other nodes to what is in it, for example, the text of a text file. The final node points to NULL as the tail of the chain of nodes. Somewhere your nodes are incorrect, and files and directories may point to something to edit when changing them that they shouldn’t be looking at. The result of writing to anything might be further corruption whereby other unrelated content is destroyed. Thus the system won’t let you write to it due to protection against further corruption.

What you can do is read the data. There are two ways to read it; the first method is to read it as files and directories. You can use rsync to take what is there and copy it to a new partition, and since the partition has its own tree of nodes which are properly set up, then creating that old content onto the new tree will do so without improperly linked nodes. The down side: Some of the content will be wrong and might be from a cross link to a different file or directory. An example might be that a binary file has text in the middle of it from a text document. The new content would never corrupt further, but the old content which is already missing or incorrect would remain missing or incorrect. You could write that content onto a newly formatted eMMC and “hope” for whatever is missing or wrong to not matter. You might not find out for a long time the extent of the damage.

The second method is to copy the partition as binary data. This is the realm of dd instead of using rsync. This is how I do disk recovery operations on a failing disk, and this also won’t fix corrupted or missing content, but it gives you more power in recovering missing files and data content (there are special tools). Doing this kind of recovery directly off of a failing disk (failing hardware) is risky, and if you use dd first so that you can work on a copy, the results are much much better. In your case you do not have a failing eMMC, it is just corrupt data. Still, if you ever want a copy of what is there that has the highest chance of recovery through more specialized tools, then this is pretty much mandatory. There is a minor possibility of recovering everything with a lot of work on a dd image of the partition (which takes a bigger learning curve and more time). The thing of this is that the same dd partition can also have rsync performed from that the same as the rsync from the actual eMMC. If you have the dd partition, then you can still use other methods. The rsync method has no further options, you get what you get.

I will add that the tools which can automatically attempt recovery work on the loopback mounted dd partition from a separate host PC. Everything can be done on this from another computer. You can obtain the dd over the network from the Jetson to your host PC at a remote location, or you can dd to the local NVMe or SD card. dd won’t care about the filesystem type on the SD card. rsync can go to the local SD card if and only if the filesystem type on the SD card is a Linux type (e.g., ext4 will work, but VFAT or NTFS will not).

It takes a long time on any slow network to dd or rsync over to another computer. rsync is faster because it can compress. If you have the time though, dd is a better result IMHO.

I’m showing you a lot of pros and cons, and have not really answered your question. All of the above is to get the original data, and you cannot fix this without the original data. Once you have suitable data, then you probably have to format the original eMMC partition as ext4 and then restore data on it. This can have some really high probabilities of failure to write to a partition that is actually in use. There are all kinds of things that can get in the way or go wrong. If you have the data ahead of time, then it won’t matter what goes wrong, you still have data to try with again.

How important is the data? How fast is the network between your systems? What space is consumed on the eMMC, and how much empty space is there on the NVMe? How large is the SD card, and is it formatted as ext4? There is a lot you can do from command line after a normal alt+F2, and a lot more if you have networking. It is faster to copy that data to a local NVMe. but then you might have to put the NVMe on your local computer to work on it. Networking lets you directly copy to your host PC. Describe what resources you have for networking and host PC (including local and/or remote).

nagesh_accord · May 18, 2024, 4:07am

Yes. I suppose as per customer updates

nagesh_accord · May 18, 2024, 6:38am

that data on eMMC where Root FS is mounted and flashed. It has all the other softwares Jetpack components installed needed by the customer.

Network option is ruled out as we cannot have network access as it is very secured. Only option left is local copy to external SD card or NVMe M.2 drive.

eMMC is 64 gb, SD card is 64 gb, NVMe is 500 GB.
Both SD card and NVMe are formatted to Ext4 I suppose.

However, currently we are trying to fix this issue, by removing the SD card and NVMe physically from the carrier boards and try booting and see, if it boots correctly.

Next option is to go the customer site with a host pc setup on a laptop and try reflashing. I will let you know more about this in coming days, how the debugging/trouble shooting steps progress.

Thanks a lot for your precious thorough elaborative information.

linuxdev · May 18, 2024, 11:07pm

The NVMe will be your destination in this case. If when complete the file created is small enough to fit on SD, then you could put it there as well. Networking to retrieve the file is an option if you get permission, but being there physically would be a lot faster.

I will state ahead of time that when a filesystem is corrupt, that copy via rsync may stop. It is a hit and miss test. The dd method cannot be used without bad things happening if the filesystem is in use, although it might be an option if you shut down any known running programs and operate only from the NVMe. Cloning from a recovery mode Jetson while being physically present is the superior method, and is the same result as a dd clone if the clone is of a read-only or unmounted filesystem. If things go badly, then we might be able to find a way to manually remount the rootfs read-only while keeping the NVMe read/write.

This is the easiest thing to do that follows, and does not require you to be there. ssh login is fine so long as you have sudo access. Test that out with something like “sudo ls”. To drop into a root shell you can run “sudo -s”. You’ll want to be in the root shell for what follows.

Go to the NVMe. Be certain it has more than 64 GB remaining space (“df -H -T .” to see that location’s content because “.” is an alias for “the current directory”). Find an empty directory. I suggest you create one like “clone1” for the first clone attempt. This will take significant time, so if you like coffee, I suggest have some handy.

We will call the directory where the clone is destined on the NVMe to be “/clone1”. More likely it is some other subdirectory, but use your imagination. This is the directory of destination. This will contain a mirror of the existing eMMC without tree corruption, but it will be affected by any errors in the original tree. You won’t need to worry about access to this location having further corruption if you write to it or read. An image can be made from this. The flash content on a host PC can be updated to flash this content if you wish, or to even put edited parts of this onto the flash content of the host PC when flash time comes (meaning the flash will give you back a working system with your software running on it).

A very useful and important not about one rsync option that is a safety. You can use “--dry-run”, and you will see all operations as they would happen if this were a real run. Nothing will be done. If you do this and it looks right, then you can remove “--dry-run” and actually make this happen. You could discover things like filesystem errors getting in the way, but more importantly, you could find it is pulling or place files in the wrong place, or running into permission issues. I might show a command twice, once with --dry-run, once without.

sudo -s
cd /clone1
Substitute any “/clone1” with your location.
Version 1, with --dry-run:

rsync --dry-run -avcrltxAP --info=progress2,stats2 --numeric-ids --exclude '.gvfs' --exclude 'lost+found' '/' /clone1

One important thing to know is that the “-x” option says to not cross filesystems. The mount point of the SD card will be ignored (the mount point would be recorded, the content not copied).

The reason we don’t include the “lost+found” is that this is part of any ext4 partition. However, because your destination location is a subdirectory, and not a mount point, this means your own system won’t have a lost+found/ subdirectory within your subdirectory. You could in fact copy lost+found/ as well. This location is reserved for content which filesystem repair has trimmed and removed…it is the destination of node “pruning”. I am going to go ahead in the next version of this and suggest you go ahead and run this without exclusion of lost+found/ since it could be useful if anything has already been pruned, and won’t harm anything since you are in a subdirectory. This version might allow you to better recover and know what was lost:

Version 2, with --dry-run:

rsync --dry-run -avcrltxAP --info=progress2,stats2 --numeric-ids --exclude '.gvfs'  '/' /clone1

It is important to know if there is a location that you do not want copied, then you can use “--exclude '/some/where'”, and then dry run to see if it is what you want. rsync is fairly reliable so I don’t expect any significant errors at this point. We exclude .gvfs because it is a pseudo-filesystem and not part of the disk, and although this would not be crossed (it is a filesystem boundary) it will give you a lot of errors since it is security based and won’t allow users to read it.

Here is the final suggested version. I will add logging to this so you have a record of all that happened and all that failed.

Final version, with logging and no --dry-run:

rsync --dry-run -avcrltxAP --info=progress2,stats2 --numeric-ids --exclude '.gvfs'  '/' /clone1 2>&1 | tee log_rsync.txt

The log will be log_rsync.txt.

If the devices themselves do not error, this should do the job. We can talk about how to use this to create a new system. It is quite difficult to do so on a running system, but there can be compromises (not necessarily acceptable, but maybe acceptable) to use rsync on a repair of that image.

Let us know if you have the files of the existing system on the NVMe, and also if you are able to access a host PC there for flash. Flash can use that image.

Just a reminder, if you are going to end up there in person, then a clone via recovery mode would be a superior method. Even so, it is good to have the rsync backup. Having both gives you an enormous amount of room to rescue whatever remains intact.

nagesh_accord · May 29, 2024, 12:07am

We received unit which was stopping in the middle.of.the boot due to corrupted file system and partition table.
Customer had used fdisk, mount, unmount commands unknowningly with out thinking the repercussions of it which resulted in corruption of partitioning table etc

We. fixed the issue by reflashing again from our host pc and installing all the softwares once again.

linuxdev · May 29, 2024, 3:27pm

You might want to set up an rsync script for backup (which works on a running unit) in the future. The customer could save a lot of time that way. Once corrupted you have to clone and repair the clone. An rsync backup (if complete and preserving numeric IDs and permissions) can be used to create a flash image, or simply to update the sample image to your image.

nagesh_accord · May 29, 2024, 3:48pm

This time used.backup and restore script and took back of the complete eMMC memory where RFS reside, so that next time, if unit goes bad

Will.use restore command using same.script and bring back the unit in short time.

Please provide rsync script setup.what commands to be done.

Also tell us, If unit stops booting in the middle, how to restore the cloned image. back in console.mode?

linuxdev · May 30, 2024, 5:52pm

I don’t know the specifics for your case until it occurs. First I’ll suggest some information on rsync topics:

I actually have too many posts on the topic to zero in on one. You can skim those though and find which ones offer command examples. Concentrate first on clone or backup. Do realize that some of the articles are for different models of Jetsons.

Always keep in mind these rsync options:

--dry-run
This allows you to see what a command would do without really doing it. Makes a lot of testing safe.
--numeric-ids
This is mandatory for copy of content from one Linux system to another if you want to keep ownership the same. Users and groups are really known by their alias, such as the user name, but the true identifier is the numeric ID. This option allows one to back up and restore the ID, not just an alias.
--delete-before
If you don’t have a lot of backup space, or if you don’t care about special failure cases (such as loss of power during the backup), then this can reduce the peak amount of disk space during the actual backup. With this, if a file is modified and going to be copied, then at the destination the old content is erased before copy; without this, the new file is added with a temp name, and then when complete, the original is overwritten.
Many options are just to preserve permissions. There is overlap and it usually doesn’t hurt.
Jetsons don’t normally use extended security attributes (ACLs, or Access Control Lists from SElinux), but if you do have this, then the -X option preserves this. Might not matter if your filesystem is not set up for ACLs.
--info=progress2,stats2
This is just to see progress. Backups can take a long time, and it adds anxiety if you don’t know what is going on.
Local or remote accounts over ssh are usually interchangeable. If you use -e ssh, then these examples offer destinations or sources:
- /home/someone
- someone@remotehost.com:/home/someone
If you set up ssh keys, then command line ssh with rsync is trivial effort.
Sometimes options require root authority, e.g., --numeric-ids at the end which writes a file using a numeric ID. I usually unlock root login over ssh with keys only on Ubuntu since it is much much simpler than playing with the options to do so with sudo. In this case your root user would have a public/private key pair, and the only login is via sudo or via ssh using key pairs (password login is still prohibited and keys can be revoked).

Restoring directions depend on how the backup was created. There isn’t one easy answer. If you mean console mode from a normal login, then that is usually trivial since the system is running (anything in boot stages immediately makes the problem much more difficult). If you mean a rescue mode, then this is not easy. An exception might exist if you’ve customized boot to allow a bash shell with networking and ssh. Otherwise you’re back to flashing instead of direct restore. When flashing there are a number of ways to use backed up content. This is an entire industry on its own, so you’d have to ask when specifics are known.

I highly recommend using a Jetson you don’t depend on to practice backup and restore. Having actually tested what you know is incredibly useful.

system · June 18, 2024, 12:53pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Switching between Kernels not working in AGX Xavier Jetson AGX Xavier kernel	58	1345	January 23, 2024
How to Boot from USB Drive? Jetson AGX Xavier boot , usb	50	9288	October 18, 2021
Recovery Mode Jetson AGX Xavier	17	5812	October 18, 2021
Native boot from M.2 NVME SSD Jetson Xavier NX nvme	77	21700	October 13, 2021
Enforce mount, partitioning and other options before flashing the rootFS Jetson Xavier NX kernel	6	722	May 2, 2023
Difficulty Flashing - AGX Xavier Jetson AGX Xavier reflash	18	625	March 22, 2023
Jetson Xavier NX from cyberdog reflash Jetson Xavier NX reflash	8	37	March 31, 2025
Jetson SDK Components: recovery mode problem Jetson AGX Xavier	19	4574	October 18, 2021
Flash TX2 with SD card Jetson TX2	35	7781	October 18, 2021
Device Tree customization for AGX Jetson AGX Xavier board-design , device-tree	11	2930	May 28, 2022

Can we use Jetson AGX Xavier as host pc for flashing?

Related topics