Recommended way to duplicate emmc rootfs (APP partition) into internal ssd (nvme) without dd

Hi,
I noticed that when I’m using dd to duplicate my emmc rootfs into internal ssd (in order to boot the system using duplicated rootfs) I get errors:

  • sometimes I get corrupted files
  • sometimes dd stuck
  • If I execute: cmp /dev/nvme0n1p1 /dev/mmcblk0p1 they always differ.
    the command is:
    sudo dd if=/dev/disk/by-partuuid/76801fb4-d266-4e83-a90c-4a22fbdfaa3b conv=sync,noerror bs=4M of=/dev/nvme0n1p1
    where the partuuid is the APP partition.

Is there any other tool/method to safely duplicate the current running rootfs into another device?

Thanks

hello BSP_User,

may I know what’s the actual use-case? dd it may take very long time to copy the content if you have huge data.
please see-also developer guide, To clone a Jetson device and flash to copy system.img from the filesystem partition, i.e. APP partition.

yes:
I have a Xavier NX on a custom carrier board with an internal NVME.

Until now:
I flashed the xavier nx emmc (full flash with ./flash.sh)
I logged in the xavier , created the new internal nvme(ssd) partition,
dd the emmc APP partition into the newly created nvme partition,
modified both extlinux.conf files (in emmc and nvme root file systems) with root=/dev/nvme0n1p1

The first problem is that sometimes dd result with corruptions,
The second problem is that the ssd is internal (via pcie) so I can’t use these instructions:
“To set up an NVMe drive manually for booting”

hello BSP_User,

why don’t you check developer guide to use initrd for flashing to an external storage device.
for example, Flashing Support — Jetson Linux<br/>Developer Guide 34.1 documentation

I read about this method but for some reason it fails for me:
Log will be saved to Linux_for_Tegra/initrdlog/flash_1-3_0_20220912-120844.log
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes: 1102
Ongoing processes:
Flash complete (WITH FAILURES)

log:
flash_1-3_0_20220912-120844.log (10.7 KB)

I did try the following , according to the “To set up an NVMe drive manually for booting” section:

  1. Create two partitions on the internal ssd: APP (nvme0n1p1) and APP_B (nvme0n1p2)
  2. Created the system.img.raw as described in the guide section
  3. using scp: copied the system.img.raw into APP_B
  4. used rsync:
    sudo rsync -axHAWX --numeric-ids --info=progress2 --inplace --exclude=/proc /mnt/APP_B/ /mnt/APP

but i get the error:
9,279,946,752 100% 203.55MB/s 0:00:43 (xfr#1, to-chk=0/2)
rsync: write failed on “/mnt/APP/system.img.raw”: No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]

Assuming you want to duplicate the emmc rootfs /dev/mmcblk0p1 currently mounted
as / to the internal ssd /dev/nvme0n1p1.

  1. Create fresh fs on SSD: /dev/nvme0n1p1
    sudo mkfs.ext4 /dev/nvme0n1p1
  2. Mount the SSD to /tmp/mnt
    sudo mkdir /tmp/mnt
    sudo mount /dev/nvme0n1p1 /tmp/mnt
  3. Sync the fs
    cd / && sudo find . -xdev -print | sudo cpio -pdm /tmp/mnt
  4. Umount the SSD
    sudo umount /tmp/mnt

there’s failure, please try another disk with enough storage.

I do have enough storage.
that’s why this error is strange.

I don’t have enough storage on the current rootfs (mmcblk0p1) but on both nvme0n1p1/2 I have 28 GB Available.
I use the --in-place flag for the rsync to copy directly from nvme0n1p2 to nvme0n1p1 without creating temp files on mmcblk0p1.
I still get this error

Can you please clarify what this command is for:

  1. Sync the fs
    sudo find / -xdev -print | cpio -pdm /tmp/mnt

and how you actually copy the rootfs from mmcblk0p1 to nvme0n1p1?
I experienced problems when using dd and rsync.

The “find” command gets the list of all files/directories from the / (where mmcblk0p1 is mounted), and “cpio” command re-create those files/directories on /tmp/mnt (where nvme0n1p1 is mounted).
Once the command finishes, all files/directories are copied.

I will try and update

I thought I would add that if dd is used with a mounted device, then likely (A) read would have changed bytes during the read (possible corruption), or (B) bytes would change during the write (possible corruption), or (C) bytes would block during simultaneous write (dd stuck). It is important to know the mount status if either reading or writing.

Before continuing, what you are describing is that the tools are doing what they were supposed to do. This is not related to a Jetson itself, but Linux in general. This is even true for other operating systems using other tools (though tool names would change) and is not just a Linux issue.

Also, ext4 partitions keep statistics. Certain details like mount count are used for maintenance software to decide if, after a certain number of mounts, the filesystem should be checked even if no error exists. Or, if the journal is not flushed, what content to back out (a mounted partition can have content in it which is cached, but not written, and using dd in that time would cause a different content than what you think is on the disk). Even if there is nothing more than a mount count statistic change, then you could expect a different checksum despite the filesystem otherwise being an exact copy. You would need to make such a checksum without either filesystem ever being mounted. A running rootfs cannot be cloned in the usual way.

The proper tool for a running filesystem is one which understands ext4 itself, and dd only understands block devices as binary data with no concept of ext4. That tool would be rsync, but it produces a copy of the contents of a filesystem, whereas dd produces copies of a partition. You would not get a matching checksum by this method for the partition as a whole; what you would get are individual file contents which match checksums, along with certain metadata (e.g., permissions, but arguments to rsync can change which metadata is an exact duplicate, versus translated).

If you use dd you don’t need to partition, nor format, as this copies the partition and filesystem verbatim without even knowing it. If you are to use rsync, then you must first create a partition and a filesystem (preferably blank, or an old release of the same thing being updated). For rsync to work you have the reverse condition of dd: dd wants no mount or read-only, whereas rsync must be mounted (although read-only is better, it can work read-write). Assuming you mounted “/dev/nvme0n1p1” on custom location “/mnt/nvme” (you would have had to have created that partition and formatted it as ext4), and that “/dev/mmcblk0p1” is the root filesystem (mounted on “/”), here is one example of using rsync (there are lots of options):

sudo rsync -avcrltxAP --info=progress2,stats2 --delete-before --numeric-ids --exclude '.gvfs' --exclude 'lost+found' / /mnt/nvme

In the above note that “/” is the source, and “/mnt/nvme” is the destination, and that both are mounted. This means the destination already has a partition with an empty ext4 filesystem on it. This excludes special subdirectories “lost+found/” (an ext4 place to store corruption failures) and “.gvfs” (a “synthetic” filesystem used for security and not part of the hard drive).

A very important point to note is the “-x”: Since this is a mounted filesystem, this means that “/” will contain (via being a tree) other partitions which are not part of the hard disk, e.g., “/proc”, “/sys”, and “/dev”. The “-x” says to not cross filesystems, and to stick to the original filesystem (technically, with “-x”, it shouldn’t need “--exclude .gvfs”).

The “--numeric-ids” is important if this ever leaves the original system and gets copied to another Linux system since copy from one Linux to another could result in file ownership translation (e.g., suppose you have user “nvidia” on the Jetson, but not on the PC…then the PC would translate this unless “--numeric-ids” is used…flashing back to the Jetson would not do what you expect without this option).

Bit tip: Use option “--dry-run” if you want to see the messages as a test, but don’t want it to really write. Then, if it does what you want, remove the “--dry-run”.

thank you for your detailed answer. It explains a lot. I will try it and update.

I did create a partition and an ext4 filesystemon the new partition: nvme0n1p1.
Since I can’t use live usb or something like that , from your answer , I can’t use dd safely.
I left with rsync but I do get the not enough space error even tough I do have enough storage and I used the in-place flag (not create temp files in the almost full mmcblk0p1 partition)

What was your destination (mount point and device mounted there)?

One thing about rsync is that it is designed to be an efficient backup tool, and so it has multiple ways of dealing with content which already exists (typically using a checksum to see if new content must be added). Once it decides to copy a file, then it normally leaves the old file in place (in case something goes wrong), followed by moving the new file over the old file once transfer is complete; this means there are two copies of a file at times. The option “--delete-before” will save space during copy by first deleting the original prior to copy (but the backup file would be destroyed even if the new backup failed). Do you have prior content in the partition?

With the NVMe mounted (I’ll pretend it is at “/mnt/nvme”, adjust for actual case), what do you see from:
df -H -T / /mnt/nvme
(this will list two locations, “/” and “/mnt/nvme”)

  1. I Created two partitions on the internal ssd: APP (nvme0n1p1) and APP_B (nvme0n1p2). each was 28GB.
  2. I Created the system.img.raw on the host.
  3. I used scp: copied the system.img.raw into APP_B
  4. I used rsync:
    sudo rsync -axHAWX --numeric-ids --info=progress2 --inplace --exclude=/proc /mnt/APP_B/ /mnt/APP

i got the error:
9,279,946,752 100% 203.55MB/s 0:00:43 (xfr#1, to-chk=0/2)
rsync: write failed on “/mnt/APP/system.img.raw”: No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]

I don’t have the target near me right now, but i’ll check the data from your last reply (df -H -T / /mnt/nvme) and report here

I updated my sync fs command to use sudo, so that files/directories can be created with proper permissions:

cd / && sudo find . -xdev -print | sudo cpio -pdm /tmp/mnt

I executed the following commands:

nx1@nx1-desktop:~$ sudo mkdir /mnt/nvme
nx1@nx1-desktop:~$ sudo mount /dev/nvme0n1p1 /mnt/nvme/
nx1@nx1-desktop:~$ cd / && sudo find . -xdev -print | sudo cpio -pdm /mnt/nvme
12144229 blocks
nx1@nx1-desktop:/$ sudo umount /mnt/nvme
nx1@nx1-desktop:/$ sudo mount /dev/nvme0n1p1 /mnt/nvme/
nx1@nx1-desktop:/$ ll /mnt/nvme/
total 112
drwxr-xr-x 19 root root 4096 Sep 14 10:59 ./
drwxr-xr-x 3 root root 4096 Sep 14 10:54 …/
lrwxrwxrwx 1 root root 7 Sep 14 10:55 bin → usr/bin/
drwxr-xr-x 4 root root 12288 Sep 14 10:59 boot/
drwxr-xr-x 2 root root 4096 Sep 14 10:53 dev/
drwxr-xr-x 140 root root 12288 Sep 14 10:55 etc/
drwxr-xr-x 3 root root 4096 Sep 14 10:55 home/
lrwxrwxrwx 1 root root 7 Sep 14 10:59 lib → usr/lib/
drwx------ 2 root root 16384 Sep 14 10:25 lost+found/
drwxr-xr-x 2 root root 4096 May 19 2021 media/
drwxr-xr-x 3 root root 4096 Sep 14 10:59 mnt/
drwxr-xr-x 4 root root 4096 Sep 14 10:54 opt/
dr-xr-xr-x 2 root root 4096 Jan 1 1970 proc/
-rw-rw-r-- 1 nx1 nx1 62 Aug 11 06:37 README.txt
drwx------ 6 root root 4096 Sep 14 10:55 root/
drwxr-xr-x 2 root root 4096 Sep 14 10:53 run/
lrwxrwxrwx 1 root root 8 Sep 14 10:55 sbin → usr/sbin/
drwxr-xr-x 9 root root 4096 Sep 14 10:55 snap/
drwxr-xr-x 2 root root 4096 Jul 31 2020 srv/
dr-xr-xr-x 2 root root 4096 Jan 1 1970 sys/
-rw-r–r-- 1 root root 1874 Sep 14 10:26 tegra194-mb1-bct-ratchet-p3668.cfg
drwxrwxrwt 19 root root 4096 Sep 14 10:55 tmp/
drwxr-xr-x 11 root root 4096 Sep 14 10:56 usr/
drwxr-xr-x 15 root root 4096 Sep 14 10:55 var/

nx1@nx1-desktop:/$ ll /
total 96
drwxr-xr-x 19 root root 4096 Sep 14 10:26 ./
drwxr-xr-x 19 root root 4096 Sep 14 10:26 …/
lrwxrwxrwx 1 root root 7 Jul 31 2020 bin → usr/bin/
drwxr-xr-x 4 root root 12288 Sep 14 10:46 boot/
drwxr-xr-x 19 root root 7940 Sep 14 10:53 dev/
drwxr-xr-x 140 root root 12288 Sep 14 10:47 etc/
drwxr-xr-x 3 root root 4096 Sep 14 10:45 home/
lrwxrwxrwx 1 root root 7 Jul 31 2020 lib → usr/lib/
drwx------ 2 root root 16384 Sep 14 10:25 lost+found/
drwxr-xr-x 2 root root 4096 May 19 2021 media/
drwxr-xr-x 3 root root 4096 Sep 14 10:54 mnt/
drwxr-xr-x 4 root root 4096 Sep 14 10:16 opt/
dr-xr-xr-x 352 root root 0 Jan 1 1970 proc/
-rw-rw-r-- 1 nx1 nx1 62 Aug 11 06:37 README.txt
drwx------ 6 root root 4096 Sep 14 10:43 root/
drwxr-xr-x 33 root root 940 Sep 14 10:53 run/
lrwxrwxrwx 1 root root 8 Jul 31 2020 sbin → usr/sbin/
drwxr-xr-x 9 root root 4096 Sep 14 10:43 snap/
drwxr-xr-x 2 root root 4096 Jul 31 2020 srv/
dr-xr-xr-x 12 root root 0 Jan 1 1970 sys/
-rw-r–r-- 1 root root 1874 Sep 14 10:26 tegra194-mb1-bct-ratchet-p3668.cfg
drwxrwxrwt 19 root root 4096 Sep 14 10:58 tmp/
drwxr-xr-x 11 root root 4096 Mar 4 2021 usr/
drwxr-xr-x 15 root root 4096 Sep 14 10:16 var/

nx1@nx1-desktop:/$ df -H -T /dev/mmcblk0p1
Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 15G 6.7G 7.4G 48% /
nx1@nx1-desktop:/$ df -H -T /dev/nvme0n1p1
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme0n1p1 ext4 63G 6.7G 53G 12% /mnt/nvme

Is it OK that on the rootfs copy I have 112 files (more than 96 on the orig)?
Is there anything else I can do to verify the rootfs was copied perfectly?

thanks

Looks good to me. BTW 112 is NOT number files. It is the total size of files/directory in the root dir. The /proc, /dev, /sys, etc on the nvme holds virtual filesystem and their sizes aren’t correct since nvme drive is offline, not really rootfs yet.

First of all Thank you very much for your help @user100090 . Now It’s working.
Thank you very much @linuxdev @JerryChang for your help and your detailed explanations.

@linuxdev I didn’t try yet using rsync following your last replys beacuse @user100090 method worked.

Two last questions on this manner please:

  1. Can you state the rsync is better than cpio command I used or vice versa? I’m asking since I want to use one of them from now on

  2. Is there any tool I can use to verify that the rootfs duplicated correctly?

thanks