Jetson Nano boot lag on NVMe SSD

I am using Jetson Nano production module (P3448-0002) with custom board. There are M.2 Key-M PCIe socket on that, 250GB Western Digital SSD in it. With Xavier NX and TX2 NX modules boots very fast, less than 10 seconds, but Nano module has a big delay during u-boot process from SSD in the same carrier board. After booting, the kernel uses the SSD perfectly without any lag. If the nvme boot is the primary option (setenv boot_targets nvme0 mmc1 mmc0 usb0 pxe dhcp), the lag is there even if no SSD in it.

I am using r32.6.1 version of u-boot and kernel with custom compile, but the original binaries produces the same phenomenon.

Is this lag is normal?
What does cause it?
How can I find and/or fix this problem, if it is not normal?

Boot log with lag times:

    .....
    [0005.237] Updated memory info to DTB
    [0005.245] set vdd_core voltage to 1075 mv
    [0005.249] setting 'vdd-core' regulator to 1075000 micro volts
    [0005.255] Found secure-pmc; disable BPMP

    U-Boot 2020.04-g46e4604c78 (Jul 26 2021 - 12:09:42 -0700)

    SoC: tegra210
    Model: NVIDIA Jetson Nano Developer Kit
    Board: NVIDIA P3450-0000
    DRAM:  4 GiB
    MMC:   sdhci@700b0000: 1, sdhci@700b0600: 0
    Loading Environment from MMC... OK
    In:    serial
    Out:   serial
    Err:   serial
    Net:   No ethernet found.
    Hit any key to stop autoboot:  0

    Device 0: Vendor: 0x15b7 Rev: 211210WD Prod: 21035M465812
                Type: Hard Disk
                Capacity: 238475.1 MB = 232.8 GB (488397168 x 512)
    ... is now current device
    Scanning nvme 0:1...
    Found /boot/extlinux/extlinux.conf
    Retrieving file: /boot/extlinux/extlinux.conf
    932 bytes read in 27 ms (33.2 KiB/s)
    1:      primary kernel
    Retrieving file: /boot/initrd
    7160133 bytes read in 55 ms (124.2 MiB/s)
    Retrieving file: /boot/Image
    34615304 bytes read in 172 ms (191.9 MiB/s)
    append: tegraid=21.1.2.0.0 ddr_die=4096M@2048M section=512M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 console=ttyS0,115200n8 debug_uartport=lsport,4 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff780000 core_edp_mv=1075 core_edp_ma=4000 gpt tegra_fbmem=0x140000@0x92cb4000 is_hdmi_initialised=1  earlycon=uart8250,mmio32,0x70006000  root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1 quiet root=/dev/nvme0n1p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1
    Retrieving file: /boot/tegra210-p3448-0002-vjc-rev4-b00.dtb
    227408 bytes read in 28 ms (7.7 MiB/s)
~28secs lag here
    ## Flattened Device Tree blob at 83000000
       Booting using the fdt blob at 0x83000000
    ERROR: reserving fdt memory region failed (addr=0 size=0)
    ERROR: reserving fdt memory region failed (addr=0 size=0)
       Using Device Tree in place at 0000000083000000, end 000000008303a84f
    copying carveout for /host1x@50000000/dc@54200000...
    copying carveout for /host1x@50000000/dc@54240000...
~9secs lag here
    DT property /chosen/nvidia,bluetooth-mac missing in source; can't copy
    DT property /chosen/nvidia,wifi-mac missing in source; can't copy
    DT property /psci/nvidia,system-lp0-disable missing in source; can't copy

    Starting kernel ...

    [    0.000000] Booting Linux on physical CPU 0x0
    [    0.000000] Linux version 4.9.253-tegra (laszpar@jetsondev) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #1 SMP PREEMPT Wed Sep 29 17:51:07 CEST 2021
    [    0.000000] Boot CPU: AArch64 Processor [411fd071]
    [    0.000000] OF: fdt:memory scan node memory@80000000, reg size 32,
    [    0.000000] OF: fdt: - 80000000 ,  7ee00000
    [    0.000000] OF: fdt: - 100000000 ,  7f200000
    [    0.000000] Found tegra_fbmem: 00140000@92cb4000
    [    0.000000] earlycon: uart8250 at MMIO32 0x0000000070006000 (options '')
    [    0.000000] bootconsole [uart8250] enabled
    .....

Hi,

Which pcie lane are you using on your custom board? Do you have other NVMe drive to test?

I am using the the PCIE0_RX0…4 and PCIE0_TX0…4 as the design guide describes on the Figure 6-7.
I didn’t tested with other SSD, but if the pci switched on in uboot, after that the lag is there without SSD or not, with eMMC boot too. The problem is produceable on the original Nano devboard too. In this case the module boots from eMMC, but the lag is there.
The following commands in u-boot console produces the phenomena:
setenv boot_targets nvme0 mmc1 mmc0 usb0 pxe dhcp
boot

Here is the log on NVIDIA P3449-B01 carrier board:

U-Boot 2020.04.r32.6.1-laszpar-tegra-dirty (Oct 28 2021 - 15:10:23 +0200)

    SoC: tegra210
    Model: NVIDIA Jetson Nano Developer Kit
    Board: NVIDIA P3450-0000
    DRAM:  4 GiB
    MMC:   sdhci@700b0000: 1, sdhci@700b0600: 0
    Loading Environment from MMC... OK
    In:    serial
    Out:   serial
    Err:   serial
    Net:   No ethernet found.
    Hit any key to stop autoboot:  0
    Tegra210 (P3450-0000) #
    Tegra210 (P3450-0000) # setenv boot_targets nvme0 mmc1 mmc0 usb0 pxe dhcp
    Tegra210 (P3450-0000) # boot

    Device 0: unknown device
    Card did not respond to voltage select!
    mmc_init: -95, time 14
    switch to partitions #0, OK
    mmc0(part 0) is current device
    Scanning mmc 0:1...
    Found /boot/extlinux/extlinux.conf
    Retrieving file: /boot/extlinux/extlinux.conf
    932 bytes read in 47 ms (18.6 KiB/s)
    1:      primary kernel
    Retrieving file: /boot/initrd
    7160133 bytes read in 219 ms (31.2 MiB/s)
    Retrieving file: /boot/Image
    34615304 bytes read in 903 ms (36.6 MiB/s)
    append: tegraid=21.1.2.0.0 ddr_die=4096M@2048M section=512M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 console=ttyS0,115200n8 debug_uartport=lsport,4 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff780000 core_edp_mv=1075 core_edp_ma=4000 gpt  earlycon=uart8250,mmio32,0x70006000  root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1 quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 sdhci_tegra.en_boot_part_access=1
    Retrieving file: /boot/dtb/tegra210-p3448-0002-vjc-rev4.dtb
    227352 bytes read in 53 ms (4.1 MiB/s)
It takes ~4secs till here.
Delay here: ~28secs
    ## Flattened Device Tree blob at 83000000
       Booting using the fdt blob at 0x83000000
    ERROR: reserving fdt memory region failed (addr=0 size=0)
    ERROR: reserving fdt memory region failed (addr=0 size=0)
       Using Device Tree in place at 0000000083000000, end 000000008303a817
    copying carveout for /host1x@50000000/dc@54200000...
    copying carveout for /host1x@50000000/dc@54240000...
Delay here: ~8secs
    DT property /chosen/nvidia,bluetooth-mac missing in source; can't copy
    DT property /chosen/nvidia,wifi-mac missing in source; can't copy
    DT property /psci/nvidia,system-lp0-disable missing in source; can't copy

    Starting kernel ...

Ok, so I can just see this issue with new boot target setting in uboot even without the existence of ssd connected, right?

Yes. No SSD socket on the P3449 board. Only M.2 Key E (WIFI) socket on that, but it is empty.

I switch on the highest level of log, and captured. It is about 1MB long.
The big hang is at the “Moving Image from…”

    iterate >initrd<
     <42008704, 172, 8>
    hit: start 2810880, count 1
     <42008704, 180, 36>
    hit: start 2810880, count 1
    iterate >tegra210-p3448-0002-vjc-rev4-b00.dtb<
    ext4fs read 160 group descriptor (blkno 3 blkoff 2048)
     <28, 0, 64>
    hit: start 81c, count 1
     <41943299, 0, 128>
    hit: start 2800903, count 1
     <42219176, 0, 227408>
    miss: start 2843ea8, count 444
    miss: start 2844064, count 1
    fill: start 2844064, count 1
    227408 bytes read in 388 ms (572.3 KiB/s)
    Initial value for argc=3
    Final value for argc=3
    Initial value for argc=3
    Final value for argc=3
    *  kernel: cmdline image address = 0x84000000
    ## Current stack ends at 0xfc5f7460 *  kernel: cmdline image address = 0x84000000
    Moving Image from 0x84000000 to 0x84080000
~28secs lag here
    *  ramdisk: cmdline image address = 0x83200000
       ramdisk start = 0x83200000, ramdisk end = 0x838d4145
    *  fdt: cmdline image address = 0x83000000
    ## Checking for 'FDT'/'FDT Image' at 83000000
    *  fdt: raw FDT blob
    ## Flattened Device Tree blob at 83000000

I added some change to booti.c:

<------>/* Handle BOOTM_STATE_LOADOS */
<------>if (relocated_addr != ld) {
<------><------>printf("Moving Image from 0x%lx to 0x%lx, size=%ld\n", ld, relocated_addr, image_size);
<------><------>memmove((void *)relocated_addr, (void *)ld, image_size);
<------><------>printf("Moving done.\n");
<------>}

and the log is the following:

    227408 bytes read in 28 ms (7.7 MiB/s)
    Moving Image from 0x84000000 to 0x84080000, size=35270656
~28secs lag here
    Moving done.

So, the memmove() function is slow somehow, when the pci-e is active in u-boot on Jetson Nano.

Is there any new info related to this?

We are checking this internally. This would take time.