OrinNX Error: Could not stat device /dev/nvme0n1 SDK 36.4

I’m using a custom board with an OrinNX 16GB with Jetpack 6DP SDK 36.2 for a few months now.
When I try to flash the latest Jetpack 6.1 SDK 36.4 the flashing process fails because it is not able to detect the nvme disk.

This is my flashing command and the log:
flash.log (278.3 KB)

I’ve ssh into the initrmfs and collect the dmesg:
dmesg.log (34.2 KB)

I’ve disabled the eeprom as documented here: Custom carrier no eeprom

Serial logs while flashing:
serial.log (34.3 KB)

PS: why is the kernel of SDK 36.4 not printing any information while booting during the flashing process?
The same flashing comand in SDK36.2 is working and the kernel printing all the booting information.

Hi EstebanBosse,

Since you are using custom carrier board, are you using custom BSP package of JP6.1(R36.4.0) released from your vendor to flash the board?

~/vanilla-sdk-36-4/Linux_for_Tegra$ sudo ADDITIONAL_DTB_OVERLAY_OPT="BootOrderNvme.dtbo" ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1       -c tools/kernel_flash/flash_l4t_external.xml       -p "-c bootloader/generic/cfg/flash_t234_qspi.xml --no-systemimg" --network usb0 jetson-orin-nano-devkit nvme0n1p1

It seems you used above flash command.
May I know why you add --no-systemimg?
Please also try using -c tools/kernel_flash/flash_l4t_t234_nvme.xml instead of -c tools/kernel_flash/flash_l4t_external.xml for partition layout file in your command.

Hi KevinFFF,
Thanks for the very quick answer.
I’ve also tried with:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit nvme0n1p1

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit internal

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit-nvme internal

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit-nvme external

But the issue seems to be that the nvme is not detected when I ssh into the initramfs for flashing to nvme.

We develop our own vendor board in house, so we do not have a vendor, but since our board is working very well with the SDK36.2 and same flashing procedure, I tend to think the issue is not the custom board but rather something that have changed between SDK36.2 and SDK36.4.

Do you have some insights about which kernel modules or pieces of dtb we should look at to find out why the disk is not being recognized?
Kernel modules like nvme and nvme_core are loaded.
Could you be so kind to please check the dmesg log I attached, mostly in the PCI part and let me know if you see something different or unexpected?

I’ve even tried with different brands of nvme disks.

Please try using the following commadn to flash the board instead.

$ sudo ADDITIONAL_DTB_OVERLAY_OPT="BootOrderNvme.dtbo" ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_t234_nvme.xml -p "-c bootloader/generic/cfg/flash_t234_qspi.xml -r" --use-backup-image --showlogs --network usb0 jetson-orin-nano-devkit internal

If you still have flash issue, please share the full flash log for further check.

R36.2 is a developer preview release which is not an expected release for production.
I remember the device tree loading is different.
Please check your /boot/extlinux/extlinux.conf to confirm which DTB in use (FDT entry).

You can have a quick diff for device tree.

Here you can find the flashing logs following your command:
flash_new.log (101.0 KB)

Error: Could not stat device /dev/nvme0n1 - No such file or directory.
Flash failure
Either the device cannot mount the NFS server on the host or a flash command has failed. Check your network setting (VPN, firewall,...) to make sure the device can mount NFS server. Debug log saved to /tmp/tmp.sCTWaeGaBc. You can access the target's terminal through "sshpass -p root ssh root@fc00:1:1:0::2" 
Cleaning up...

It seems you hit above flash failed issue.

Could you refer to the method I shared in Not able to flash Jetson Orin Nano SDK manager - #15 by KevinFFF to check if they can help?

Hey KevinFFF.

The problem is not with the nfs server. I’ve tried sshing into the initramfs and the disk nvme is not being detected by the kernel.

What puzzles is why with this SDK 36.4 kernel is not debug logging. I do have the suspicion that somehow the PCI is not detecting the nvme because the right dtb is not being used when booting for flashing.

I’ve managed to dump the dtb in use in the initramfs when the flashing fails.
I attached it here converted to dts.
initramfs_dtb_extracted.txt (332.0 KB)

yes, it seems your nvme is not detected during flash.

Could you also try with another NVMe SSD to check if it can be flashed?

We tried different nvmes brands but still no good luck. SDK36.4 is not able to recognized the nvme but SDK36.2 does.

In another attempt to understand why the nvme is not recognized I dumped the dtb from the initramfs boot before flashing in the devkit and in our custom carrier board and I found one interesting difference:

Our carrier board is equivalent to de devkit but without EEPROM memory.
We are currently trying to understand the difference between the two dtbs.

Can you help us to understand why the nvidia, sku section in both is different?

What does M.1 or H.2 [board revision] means for the dtb?
Does the board revision change something that might affect the recognition of nvme disks?

Thank you!

Are you using two modules in your test? That thing is coming along with the module and not related to your board.

Also, I don’t think this thing ever related to nvme.

You should share out the device side serial console log. I didn’t see it so far.

Hy WaineWWW,
I’ve attached the serial output in the first message on this post but here it goes again:
serial.log (34.3 KB)

Are you sure this is the full log? The system is not even trying to enable the PCIe yet.

Yes, I’m sure this is a full log.
I also found it quite interesting, because the kernel is not printing debugging information at all.

Do you have any idea what could be causing this?

What I see different from the booting process with jetpack 36.2 and 36.4 is that there is no kernel debugging information and the nvme is not recognized.

Does your kernel log get stuck in the same location everytime it got flashed?

If you want to see the kernel information you can also have a look in the dmesg output I have added to the original message.
The dmesg information was collected using ssh to access the orin after the flashing process failed

Yes, it gets stuck in the same spot always.

Just to clarify. So what is the exact situation in your serial console?

You saw the UART log start to print and suddenly it got stopped ? And after it stopped for a while it gave you the bash-5.1 initrd console to operate the board again?

When flashing in the custom board the serial never reaches the point of the bash-5.1 initrd console. That happens only when flashing the devkit.

I was working under the impression that maybe the lack of EEPROM of our custom board makes the flashing process use a wrong dtb. The wrong dtb is not configuring the uart serial port and the PCI used by the NVME.

But this is just an educated guess.

There is no such thing that lacking of carrier board EEPROM would affect UART or PCIe.

You can just do the test on rel-36.2 or rel-36.3. If the serial console log won’t get stuck in those version, then it matches to what I said.

Please be aware that there are lots of custom board in this forum from other users everyday. No one ever got this stuck point just because they don’t have EEPROM on their board. Actually, none of the custom board has the EEPROM on it.