Jetson Orin Nano fails to mount rootfs when booting from NVMe

maarten.terhuurne · May 15, 2024, 3:42pm

I used the SDK Manager to flash JetPack 6 (release version) to an NVMe stick inserted into a Jetson Orin Nano SDK board. The flashing process worked fine (after I disabled the firewall on the host to let NFS traffic through).

The board firmware was already updated to support JetPack 6 and JetPack 6 boots fine from SD card. However, when booting from NVMe, the boot process starts fine, but it fails to mount the rootfs. After investigating, it seems the rootfs itself is fine, the PARTUUID is correct, but for some reason the Linux nvme driver is encountering I/O timeouts. From the kernel log, it seems the I/O MMU is blocking certain writes.

With the usual init script, after 50 rootfs mount attempts the script gives up and the system reboots. When spawning a shell instead, I managed to get some more logging, see attached screenshot. In particular, note the timestamps: it takes a few minutes between the nvme driver being loaded and the partitions list being logged.

Secure boot is disabled in the BIOS config. I tried both the “ExtLinux” and “Kernel Partition” boot methods, but both fail in the same way.

Note that the kernel and initrd are in fact loaded from the /boot directory in the NVMe rootfs, so the boot loader can access the NVMe controller just fine, but something seems to go wrong when the Linux kernel takes over. However, when booted via SD card, I can mount the rootfs from the NVMe stick just fine, either using /dev/nvme0n1p1 or the PARTUUID.

If there is any additional information that would be useful to diagnose this problem or anything you’d like me to test, please let me know.

Kernel version: 5.15.136-tegra (2024-04-24)

maarten.terhuurne · May 15, 2024, 4:25pm

After waiting for 5 minutes for the NVMe partitions to show up, I could manually mount the rootfs, do the root pivot and resume booting from the NVMe rootfs. This booted into the Ubuntu desktop.

Once on the desktop, Ubuntu’s updater started installed updated packages, including a new kernel. However, this kernel was misconfigured: the rootfs was set to /dev/mmcblk0p1 instead of the PARTUUID of the NVMe rootfs. Additionally, the created initrd doesn’t even include the nvme driver module in /lib/modules: only the realtek ethernet driver is included there.

Note that this happened when using the “Kernel Partition” boot method both when booting into the Ubuntu desktop and on the boot afterward. The incorrect rootfs string was inserted into the kernel command line of partition /dev/nvme0n1p2 (slot A).

DaveYYY · May 16, 2024, 1:38am

Hi,

I don’t see how your first post and your second post are connected.
So what’s the real issue now?

Of course you don’t let Ubuntu update the kernel…
They don’t contain any customized stuff from NVIDIA.
Also, why /dev/nvme0n1p2 here?

maarten.terhuurne · May 16, 2024, 3:38pm

Please ignore the part about updating Ubuntu for now; if that issue persists I’ll open a separate topic for it.

My main issue is the timeouts in the NVMe driver. I currently can only boot the board using an SD card or by waiting 5 minutes and doing manual actions , while I would like to boot unattended from just NVMe.

The steps I took:

install the firmware version 36 using SD card (to prepare for JetPack 6)
flash JetPack 6 (no customizations) to NVMe using the SDK Manager; this installation completed without problems
attempt to boot the installed OS using the ExtLinux boot method

The init script in the initrd will load the nvme Linux kernel module, which encounters 6 timeouts. Because of those timeouts, it takes over 5 minutes for the NVMe partition table to be known, while usually this should be less than a second.

If left unattended, the init script tries to mount the rootfs several times a second, 50 times in total. After this, the system reboots. Therefore this will happen long before the NVMe partition table becomes available.

The init script has an option to drop into a shell. After doing this and waiting 5 minutes for the NVMe partition table to be logged to the kernel log, booting can be resumed and the system boots up fine from NVMe. So the timeout problem seems to only occur during initialization and not afterward.

The kernel log messages (see screenshot) suggest that some I/O is being blocked by the I/O MMU, which could be the cause for the timeouts.

This is the L4T kernel: it has the “-tegra” suffix in the version number (5.15.136-tegra). The nvme driver is loaded as a kernel module from the same initrd as that contains the init script which has an NVIDIA copyright header at the top. So at first boot, before running any updates in Ubuntu, everything is using NVIDIA’s drivers and the problem already occurs.

While debugging, I tried loading the kernel from the Android-style boot image in /dev/nvme0n1p2 (slot 0 for nvbootctrl) instead of using the ExtLinux boot method, but the end result was the same. The rootfs is in /dev/nvme0n1p1, where the SDK Manager put it.

DaveYYY · May 17, 2024, 1:47am

Just one question here.
Does this issue only happen on this specific NVMe disk?
Or all kinds of disks you have?

maarten.terhuurne · May 17, 2024, 11:45am

I don’t know if it’s specific to this NVMe disk: I don’t have any spares that I can reflash lying around at the moment.

The disk I tested with is a PNY CS1030 (PCIe Gen3, 250GB).

DaveYYY · May 20, 2024, 1:21am

If this cannot be reproduced on other disks then I think it’s some sort of compatibility issue.

maarten.terhuurne · May 21, 2024, 2:40pm

We could order another NVMe stick to test with. Is there a particular brand/type that is known to work well with Jetson Orin Nano?

I do wonder about the I/O MMU warnings though: if the MMU is blocking writes to the NVMe device, then the underlying problem is more likely a misconfiguration of MMU (the configured address space for the device being narrower than it should be). In that case, a different stick would probably not solve it. But we can test and see.

DaveYYY · May 22, 2024, 2:00am

We have a Samsung 980 PRO and it has been stable with Jetson so far.

maarten.terhuurne · June 14, 2024, 12:06pm

Installing on a Samsung 980 Pro indeed solved the issue.

I retried installing the latest Jetpack release on the PNY CS1030, to check whether it was the NVMe hardware or the kernel version that made the difference and the problem still occurred with this stick. So it does seem to be an incompatibility between the PNY NVMe stick and the Jetson board/kernel.

Thank you for your help!

system · June 28, 2024, 12:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Nano boot from NVMe (via M.2/PCIe) fails Jetson Nano	17	9511	October 14, 2021
Supported NVMe SSD on Jetson Nao Jetson Nano	16	2081	May 8, 2020
(BLOG) - Boot from NVMe without using SDKManager or external Ubuntu PC (A solution that works) Jetson Orin Nano boot , nvme	11	5431	June 6, 2023
Orin Nano does not boot or flash Jetson Orin Nano boot , flash	13	1638	August 15, 2023
SDKM Flashing NVMe Jetson Orin Nano DevKit Jetson Orin Nano	3	606	May 12, 2023
Booting Orin Nano with nvme only Jetson Orin Nano boot	4	357	March 4, 2024
Booting Issue on Jetson Orin Nano with Minimal Root Filesystem - PARTUUID Mount Fail Jetson Orin Nano boot	30	612	October 21, 2024
Jetson Nano does not boot (stuck on boot logs) Jetson Nano boot	15	3296	October 15, 2021
Can’t flash nvme drive with SDK Jetson Orin Nano reflash	7	570	January 31, 2024
Jetson orin Nano boot from NVMe SSD Jetson Orin Nano boot	7	5635	May 31, 2023

Jetson Orin Nano fails to mount rootfs when booting from NVMe

Related topics