Jetson Nano boot from NVMe (via M.2/PCIe) fails

Hey,

I am having trouble booting my Jetson Nano from my NVMe SSD. I attached it via a M.2 B to M.2 M Adapter (as recommended in other threads here - https://www.aliexpress.com/item/32995476307.html?spm=a2g0s.9042311.0.0.2d0f4c4ddhwI4F).

When the Jetson Nano is running of my SD card, I can access the SSD without problems and fdisk, format and write to it. I followed the Boot from SSD step-by-step guide from Jetsonhacks (https://www.jetsonhacks.com/2019/04/25/jetson-nano-run-on-usb-drive/), to copy everything over to the SSD as well as modify the boot parameter in extlinux.conf. When I reboot the Jetson, however, it does not come up. It simply reports “No root-device: Mount failed”.

I even tried to build a new kernel with USB3 support (as per the step-by-step guide - acknowledging that I am not using USB but PCIe instead). I also went through the kernel config and made sure that NVM express block device support is enabled. Still, no difference.

My question: Is there something special I have to do / consider for it to work? Has someone a working configuration and has the root partition on the M.2 SSD? Is it possible at all (in case not, why)?

Here is some more background info:

  • I have a single partition on the SSD (type 83)
  • ext4 file system (via mkfs.ext4)
  • extlinux.conf line modified to: APPEND ${cbootargs} rootfstype=ext4 root=/dev/nvme0n1p1 rw rootwait
  • tried with JetPack 4.2.1 stock kernel as well as rebuild one (USB3 enabled)
  • there seem to be some PCI problems, but I do not know what they mean (and the SSD is working when booted of the SD card, even though the errors are shown)

Side-note for the curious:
I get roughly 270MBit/s from the disk via the one lane PCIe bus (SSD: Silicon Power PCIe M.2 NVMe SSD M.2 256GB Gen3x4). My Samsung EVO 64GB SD card gives me 60MBit/s.

sudo hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
 Timing O_DIRECT cached reads:   550 MB in  2.00 seconds = 275.01 MB/sec
 Timing O_DIRECT disk reads:     810 MB in  3.01 seconds = 269.24 MB/sec

Here is a screen capture from the boot with the “No root-device” message [ 5.940835] as well as the PCI stuff:

https://imgur.com/a/ZbXsSfA

Hi,
I am doing the same thing with you, starting nano through nvme, I personally feel that the nvme driver should not be loaded when the kernel is started, and the device of nvme cannot be recognized. In short, this is a mystery.


嗨,我和你正在做一样的事情,通过nvme启动nano,我个人感觉是应为内核启动的时候没有加载nvme的驱动,无法识别到nvme的设备,总之这是个迷……

You may check that required drivers are built into the kernel…indeed, having these as modules is not an option for booting from.

What gives:

gunzip -c /proc/config.gz | grep CONFIG_BLK_DEV_NVME
gunzip -c /proc/config.gz | grep CONFIG_PCI_TEGRA

If one of these shows ‘m’ instead of ‘y’ then you would know why it fails, and you would have to rebuild and install kernel with these drivers builtin instead of modules.

I got it working, but I don’t think that I really did anything special - I did recompile the kernel with some more modules enabled, but for unrelated reasons.

The disk I use is the Adata SX6000 Lite 128gb. (Getting between 144MB/s and 200MB/s read)

I did more or less exactly the same as in the OP, but I used dd to clone the SD card, so that I have the other partitions as well in case I need them in future.

I experienced the same errors on the first power up, but it worked after a reboot. It seems to keep on giving the error on first power up. So it might be a signal integrity issue.

The adapter I got is this one:
https://www.aliexpress.com/item/32963836444.html
It comes with a 10cm,20cm and 30cm extension. I was worried about signal integrity, so I have not tried the longer ones. Once it succeeds in booting, it works and I don’t see any errors further-on so I think the initialization sequence is more sensitive to errors.

sudo hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
 Timing O_DIRECT cached reads:   350 MB in  2.01 seconds = 174.39 MB/sec
 Timing O_DIRECT disk reads: 532 MB in  3.01 seconds = 176.79 MB/sec

From the disk speeds you are getting, it seems that yours is running in pcie gen2, while I’m only getting gen1. Although, I do vaguely remember getting better speeds at some point while booting off the SD…

lspci -vv

01:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. Device 5762 (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Realtek Semiconductor Co., Ltd. Device 5762
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 84
	Region 0: Memory at 13000000 (64-bit, non-prefetchable) 
	Region 5: Memory at 13004000 (32-bit, non-prefetchable) 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	<b>Speed 2.5GT/s, Width x1</b>, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message/WAKE#
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-01-00-4c-e0-00
	Capabilities: [158 v1] #19
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=60us PortTPowerOnTime=60us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=131072ns
		L1SubCtl2: T_PwrOn=70us
	Kernel driver in use: nvme

@diebaas1 Can you tell me what you reconfigured (e.g., enabled in the kernel)?

I tried it again today with a plain image, checked for CONFIG_BLK_DEV_NVME and CONFIG_PCI_TEGRA as of @Honey_Patouceul answer (both are set to ‘y’ and not ‘m’)and cloned the SD card via dd to the NVMe. But again, it resulted in “No root-device: Mount failed”. :-(

At least I got the PCI error messages fixed. I appended pci=noaer to the APPEND line in /boot/extlinux/extlinux.conf (read about that somewhere here in the forum).

I also faced AER error messages with some disk. AFAIK, the boot option pci=noaer prevents from error messages, not errors themselves.
Another thing to try might be this.
Not tried myself, but if you are using such hw, you may also check this topic.

I will gladly send you my .config file as soon as I have a chance - there are a lot of unrelated items selected from debugging other things, but I will leave it up to you to figure out what is relevant.

Aside from that I really think it is either a power-up sequence related issue, or a pci gen issue. If you have different cables to test, try the others. I suspect you might have better success with lower signal integrity that forces gen1.

On the power issue. It currently fails on first power-up every time, and I have to remove the jumper for a short time and re-insert it for a successful boot from the nvme. On the first boot it gives a number of pci errors and then “waiting for /dev/nvme0n1p1” or something like that. But I experienced the same issue when booting from the SD card - it will only detect the nvme on the second “warm” boot.

I tried all of my other cables today. The executive summary: Still not working. :-(

Initially I was using the shortest 10cm cable. It gave me LnkSta: Speed 5GT/s, Width x1, and the described boot issue. After that I went the other extreme and used the 30cm cable. This resulted into the NVIDIA Logo showing up and then: nothing. Not even the emergency bash which usually comes up. The monitor simply went to “No Signal”. My third try was the medium cable (20cm). Link wise it gave me LnkSta: Speed 2.5GT/s, Width x1, (so, not as good as a link compared to the 10cm cable) but no working bootup. It resulted in the old error message of no root device being present.

I did another hdparm on the 20cm cable (just for the fun):

hdparm -tT --direct /dev/nvme0n1

/dev/nvme0n1:
 Timing O_DIRECT cached reads:   446 MB in  2.01 seconds = 222.17 MB/sec
 Timing O_DIRECT disk reads: 688 MB in  3.01 seconds = 228.90 MB/sec

Looking forward to your kernel config. Maybe something else is needed which is not enabled in the one default one.

Sorry, I completely forgot about this - here is my config.

https://pastebin.com/raw/5R3NHQgz

Thanks for the config but it didn’t get me any further. Meanwhile I gave up on this and got myself a USB-C SSD. :-(
(Sandisk Extreme Portable; giving me read/write values in the 350MB/s range)

In case someone else has some success with it, please let me know and I will try it again.

As mentioned by @Honey_Patouceul, it should just work if both NVMe and PCIe host controller drivers are built into the kernel and not as modules and if NVMe SSD cloned with the same image of the SD card (preferably using DD command).
Since @diebaas1 got it working, I don’t see any reason why it shouldn’t work for you. Please check once again to make sure that nothing is wrong.

Hi,
I’ve been struggling to make this work for couple of hours, but using https://github.com/JetsonHacksNano/rootOnUSB as reference I copied fs from the SD card to nvme disk and I did:

$ sudo sh -c 'cat > /etc/initramfs-tools/hooks/usb-firmware <<EOD
if [ "$1" = "prereqs" ]; then exit 0; fi

. /usr/share/initramfs-tools/hook-functions

copy_file firmware /lib/firmware/tegra21x_xusb_firmware
EOD
'
$ sudo chmod +x /etc/initramfs-tools/hooks/usb-firmware
$ sudo mkinitramfs -o /boot/initrd.img-$(uname -r)

and updated /boot/extlinux/extlinux.conf:

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd.img
      APPEND ${cbootargs} pci=nomsi root=/dev/nvme0n1p1 quiet

And now it works fine. Note that INITRD must point to either the symlink (initrd.img) or our new img file (initrd.img-4.9.140-tegra), using the original initrd (without the extension) doesn’t work.

Small tweaks on nvme0n1p1 mounted rootfs:

$ sudo rm -rf /boot
$ sudo ln -s /mnt/mmc/boot /boot

Update /etc/fstab:

/dev/root            /                     ext4           discard,noatime,errors=remount-ro            0 1
/dev/mmcblk0p1       /mnt/mmc              ext4           discard,noatime,errors=remount-ro            0 2

My disk: XPG GAMMIX S11 Pro 1TB with the adaptor from aliexpress mentioned earlier.

Maybe that was the part I was missing. Thanks for the hint. I will try it later this week.

I followed your procedure and it worked but now usb devices are not recognized, any suggestion?

sounds great

What is the data rate of the drive and the response time?