Jetson Orin Nano fails to mount rootfs when booting from NVMe

I used the SDK Manager to flash JetPack 6 (release version) to an NVMe stick inserted into a Jetson Orin Nano SDK board. The flashing process worked fine (after I disabled the firewall on the host to let NFS traffic through).

The board firmware was already updated to support JetPack 6 and JetPack 6 boots fine from SD card. However, when booting from NVMe, the boot process starts fine, but it fails to mount the rootfs. After investigating, it seems the rootfs itself is fine, the PARTUUID is correct, but for some reason the Linux nvme driver is encountering I/O timeouts. From the kernel log, it seems the I/O MMU is blocking certain writes.

With the usual init script, after 50 rootfs mount attempts the script gives up and the system reboots. When spawning a shell instead, I managed to get some more logging, see attached screenshot. In particular, note the timestamps: it takes a few minutes between the nvme driver being loaded and the partitions list being logged.

Secure boot is disabled in the BIOS config. I tried both the “ExtLinux” and “Kernel Partition” boot methods, but both fail in the same way.

Note that the kernel and initrd are in fact loaded from the /boot directory in the NVMe rootfs, so the boot loader can access the NVMe controller just fine, but something seems to go wrong when the Linux kernel takes over. However, when booted via SD card, I can mount the rootfs from the NVMe stick just fine, either using /dev/nvme0n1p1 or the PARTUUID.

If there is any additional information that would be useful to diagnose this problem or anything you’d like me to test, please let me know.

Kernel version: 5.15.136-tegra (2024-04-24)

After waiting for 5 minutes for the NVMe partitions to show up, I could manually mount the rootfs, do the root pivot and resume booting from the NVMe rootfs. This booted into the Ubuntu desktop.

Once on the desktop, Ubuntu’s updater started installed updated packages, including a new kernel. However, this kernel was misconfigured: the rootfs was set to /dev/mmcblk0p1 instead of the PARTUUID of the NVMe rootfs. Additionally, the created initrd doesn’t even include the nvme driver module in /lib/modules: only the realtek ethernet driver is included there.

Note that this happened when using the “Kernel Partition” boot method both when booting into the Ubuntu desktop and on the boot afterward. The incorrect rootfs string was inserted into the kernel command line of partition /dev/nvme0n1p2 (slot A).

Hi,

I don’t see how your first post and your second post are connected.
So what’s the real issue now?

Of course you don’t let Ubuntu update the kernel…
They don’t contain any customized stuff from NVIDIA.
Also, why /dev/nvme0n1p2 here?

Please ignore the part about updating Ubuntu for now; if that issue persists I’ll open a separate topic for it.

My main issue is the timeouts in the NVMe driver. I currently can only boot the board using an SD card or by waiting 5 minutes and doing manual actions , while I would like to boot unattended from just NVMe.

The steps I took:

  • install the firmware version 36 using SD card (to prepare for JetPack 6)
  • flash JetPack 6 (no customizations) to NVMe using the SDK Manager; this installation completed without problems
  • attempt to boot the installed OS using the ExtLinux boot method

The init script in the initrd will load the nvme Linux kernel module, which encounters 6 timeouts. Because of those timeouts, it takes over 5 minutes for the NVMe partition table to be known, while usually this should be less than a second.

If left unattended, the init script tries to mount the rootfs several times a second, 50 times in total. After this, the system reboots. Therefore this will happen long before the NVMe partition table becomes available.

The init script has an option to drop into a shell. After doing this and waiting 5 minutes for the NVMe partition table to be logged to the kernel log, booting can be resumed and the system boots up fine from NVMe. So the timeout problem seems to only occur during initialization and not afterward.

The kernel log messages (see screenshot) suggest that some I/O is being blocked by the I/O MMU, which could be the cause for the timeouts.

This is the L4T kernel: it has the “-tegra” suffix in the version number (5.15.136-tegra). The nvme driver is loaded as a kernel module from the same initrd as that contains the init script which has an NVIDIA copyright header at the top. So at first boot, before running any updates in Ubuntu, everything is using NVIDIA’s drivers and the problem already occurs.

While debugging, I tried loading the kernel from the Android-style boot image in /dev/nvme0n1p2 (slot 0 for nvbootctrl) instead of using the ExtLinux boot method, but the end result was the same. The rootfs is in /dev/nvme0n1p1, where the SDK Manager put it.

Just one question here.
Does this issue only happen on this specific NVMe disk?
Or all kinds of disks you have?

I don’t know if it’s specific to this NVMe disk: I don’t have any spares that I can reflash lying around at the moment.

The disk I tested with is a PNY CS1030 (PCIe Gen3, 250GB).

If this cannot be reproduced on other disks then I think it’s some sort of compatibility issue.