Issue regarding flashing Orin NX board

Hello.

I use NVIDIA Jetson Linux 35.2.1 release.
And use this command for flashing:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device sda1
-c tools/kernel_flash/flash_l4t_external.xml -p “-c bootloader/t186ref/cfg/flash_t234_qspi.xml”
–showlogs --network usb0 p3509-a02+p3767-0000 internal

When I flash this NVMe memory:


It is OK.

But when I try to flash this memory:


It fails.

Flash script log:

Active index file is /mnt/external/flash.idx
Number of lines is 17
max_index=16
writing item=1, 9:0:primary_gpt, 512, 19968, gpt_primary_9_0.bin, 16896, fixed--0, 9013c87e9a58533ea2a0d6fbda66797b9582e0e0
Error: Could not stat device /dev/nvme0n1 - No such file or directory.
Flash failure
Cleaning up…

Board log:

[ 18.860357] LUN: removable file: (no medium)
[ 18.861551] LUN: removable file: (no medium)
[ 18.862684] LUN: removable file: (no medium)
[ 18.863843] LUN: removable file: (no medium)
Connection timeout: device /dev/nvme0n1 is still not ready.
[ 18.866426] rndis0: HOST MAC 92:c2:d4:ec:29:fa
[ 18.866536] rndis0: MAC 22:a2:e9:75:91:ed

Why does it fail and how to fix this issue?

I don’t know; maybe it’s just some compatibility issue with the second NVMe drive you are using.
Also, are you sure you are using sda1? It should have been nvme0n1p1.

Try flashing and booting from a USB drive, and keep the problematic NVMe drive plugged in;
if it does not get detected under full a Linux system, then there’s little we can do.

Please also try 35.4.1 or 36.2 DP.

Hello.

  1. I’ve just connected NVMe drive to system, which is booted from eMMC. And this drive is recognized correctly. Here is some information about it:
0005:01:00.0 Non-Volatile memory controller: Sandisk Corp Device 5007 (rev 01) (prog-if 02 [NVM Expre
ss])
        Subsystem: Sandisk Corp Device 5007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- 
DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR
- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 35
        Region 0: Memory at 1f40000000 (64-bit, non-prefetchable) [size=16K]
        Region 4: Memory at 1f40004000 (64-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [b0] MSI-X: Enable+ Count=17 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=4 offset=00000000
        Capabilities: [c0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <256ns, L1 <8us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- Compliance
SOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- Unsup
Req- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- Unsup
Req- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- Unsup
Req- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [1b8 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [300 v1] #19
        Capabilities: [900 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
                          PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=40us
        Kernel driver in use: nvme


demo@tegra-ubuntu:~$ sudo smartctl --info /dev/nvme0n1
smartctl 6.6 2016-05-31 r4324 [aarch64-linux-4.9.253-tegra] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD IX SN530 SDBPTPZ-256G-XI
Serial Number:                      23440F800424
Firmware Version:                   2412T000
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 256,060,514,304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Dec 18 08:13:51 2023 UTC
  1. I’ve tried to flash 35.4.1 and 36.2 releases, both are failed with the same error.

What it the way for fixing it?
Maybe, it is necessary to add extra delay for better recognizing of drive?

Sorry, I flashed Orin NX by command

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1   -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml"   --showlogs --network usb0 p3509-a02+p3767-0000 internal

It was mistake in my post.

Maybe you should try this:
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/FlashingSupport.html
Refer to the section of To set up a flash drive manually for booting.

Yes, I can flash image this way.

But it is necessary to flash image via carrier board.
I’ve noticed, that NVMe memory, which is flashed successfully, has version 1.3.
And NVMe, which is failed, has version 1.4.

So, what is the way to flash NVMe version 1.4 via carrier board?
Maybe, needs modification of flash script?

Did you try to flash NVMe storage version 1.4?

Hi, do you mean you can flash version 1.4 via carrier board? What is the “carrier board”? Are you testing on devkit or custom board?

Hello.

No.
I have carrier board p3766.
I can flash image to NVMe storage version 1.3 via this carrier board.
I can’t flash image to NVMe storage version 1.4 via this carrier board.

About your question. I can flash image to NVMe storage version 1.4 only manually, refer to the section of To set up a flash drive manually for booting, as DaveYYY mentioned.

But we want to create our own carrier board, and NVMe storage version 1.4 will be build-in. And we will not have possibility to flash storage manually.

So I have questions: did you try to flash image to NVMe storage version 1.4 via carrier board? Is it possible?

Any updates on this?

We don’t have any NVMe drives with version 1.4 for testing, so there’s little we can help.
However, maybe you should first clarify that this is really about NVMe 1.3/1.4, but not a compatibility issue of the very NVMe drive you are using.

We checked this issue again.
And we have found a drive version 1.4 which work.

The list of drives which work:
KIOXIA KBG4AZNV512G
Micron CT20000P3SSD8

And our issue related to Western Digital drive.
We tested these drives:
WD SDPTPZ-256G-XI
WD SDBPNPZ-2T00-XI

Both are OK in Linux, when we are booting from eMMC.
But we can not flash it.

Did you try to flash image to NVMe drive Western Digital?
What can be a reason of our failure?

Maybe it’s just some compatibility issue.
We cannot test all kinds of NVMe drives available on the market so we cannot guarantee that every single one would work.

Maybe, you can advise how to debug it?
And how to fix.

I don’t think you can fix it. Maybe it’s related to the firmware in the NVMe drive.
Please just keep those working ones.

Why does this drive work, when we connect it to Jetson, booted from eMMC, but fails, when we try to write image by command?

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1   -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml"   --showlogs --network usb0 p3509-a02+p3767-0000 internal

You are comparing the detection result on kernel 4.9/rel-32 with kernel 5.10/rel-35, which is meaningless.
Please at least find a device running 35.4.1 and validate that if the NVMe drive is being detected in that scenario.

There is my sequence of actions with NVMe drives.

In all cases I use NVIDIA Jetson Linux 35.4.1

First of all, I’ve flashed this Release to Xavier NX CPU, attached to a reference carrier board (P3509-0000). OS booted from eMMC. And I serially inserted two types of NVMe drive.

  1. Kioxia.
  2. Western Digital.

Both drives are worked.
There are two logs of lspci command.
kioxia_pci.txt (4.9 KB)
WD_pci.txt (4.5 KB)

My conclusion: kernel, and its drivers are working correctly with both types of NVMe storages.

Next step. I try to flash Orin NX CPU, attached to P3767 carrier board.
Command from developer guide:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1
-c tools/kernel_flash/flash_l4t_external.xml -p “-c bootloader/t186ref/cfg/flash_t234_qspi.xml”
–showlogs --network usb0 jetson-orin-nano-devkit internal

The first drive is Kioxia.
Flashing is successful.
There are two logs: from HOST side and target side.
kioxia_host.log (310.6 KB)
kioxia_target.log (186.0 KB)

The second drive is WD.
Flashing is failed.
There are two logs: from HOST side and target side.
WD_host.log (277.3 KB)
WD_target.log (100.4 KB)

As I understood from kernel log, there is a problem with PCI recognition:

Kioxia:
3502 [ 8.062644] tegra194-pcie 14160000.pcie: Link up
3503 [ 8.068434] tegra194-pcie 14160000.pcie: PCI host bridge to bus 0004:00

WD:
1776 [ 6.702160] tegra194-pcie 14160000.pcie: Phy link never came up
1777 [ 6.702390] tegra194-pcie 14160000.pcie: PCI host bridge to bus 0004:00

How is it possible, considering that these both drives worked in Xavier NX CPU with the same kernel version? Can it be some hardware incompatibility on Nvidia carrier board with Western Digital drive? How to debug this issue?

Here is the document for typical link not up debug tips for PCIe.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=universal#debug-pcie-link-up-failure

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.