Assert on NVME PCIE Boot

I have a custom Orin NX/Nano carrier board. On boot I get the following ASSERT before the system resets.

Jetson UEFI firmware (version 3.1-32827747 built on 2023-03-19T14:56:32+00:00)
ESC to enter Setup.
F11 to enter Boot Manager Menu.
Enter to continue boot.
** WARNING: Test Key is used. **

L4TLauncher: Attempting Direct Boot
ASSERT [NvmExpressDxe] /out/nvidia/bootloader/uefi/Jetson_RELEASE/edk2/MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressHci.c(772): (Private->Cap.Mpsmin + 12) <= 12

Resetting the system in 5 seconds.

It looks like something to do with the sector size. I successfully flashed and booted this SSD on the Orin Nano devkit with the Orin NX.

Hi AlexKlimaj,

What’s the physical size of your NVMe SSD?

Could you help to share the flash command you used and the logs to flash the SSD?

and also the XML you used to flash the board.

I used the command in the dev guide for flashing an orin nx nvme.

I formatted the drive with etx4 and installed it on the carrier board with an Orin Nano. When I try to format the partition in Disks, it causes a kernel panic on the Orin Nano.

I’m not sure at this point if it is hardware related or the SSD isn’t compatible with the Orin. I have ordered some other SSDs to try.

This is the drive.

When I try to format the drive.

jetson@orin-nano:~$ [  233.025685] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  233.035709] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  233.044326] pcieport 0004:00:00.0:    [ 0] RxErr
[  233.050971] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  233.060834] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[  233.069465] pcieport 0001:00:00.0:    [ 0] RxErr
[  233.075787] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
[  233.086917] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00004020/00400000
[  233.095520] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[  233.102515] pcieport 0001:00:00.0:    [14] CmpltTO
[  233.109029] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  233.120061] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  233.128666] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  233.135661] nvme nvme0: frozen state error detected, reset controller
[  234.213874] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  234.245917] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  234.254533] pcieport 0004:00:00.0:    [ 0] RxErr
[  234.260853] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  234.270760] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[  234.270764] logitech-djreceiver 0003:046D:C534.0006: logi_dj_probe: logi_dj_recv_query_paired_devices error:-71
[  234.279412] pcieport 0001:00:00.0:    [ 0] RxErr
[  234.296073] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
[  234.307248] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00004020/00400000
[  234.307252] hub 1-2:1.0: hub_ext_port_status failed (err = -71)
[  234.315868] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[  234.328951] pcieport 0001:00:00.0:    [14] CmpltTO
[  234.335402] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  234.346449] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  234.355048] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  234.362043] nvme nvme0: frozen state error detected, reset controller
[  234.369052] usb 1-2: clear tt 3 (0050) error -71

Please share the full commands you used including all the parameters.

Please also share the XML you are using to flash your board.

Do you mean that you are formatting a SSD drive which is mounted as rootfs currently?

Just the XML for the jetson orin nano devkit. Nothing changed.

No, I also have an Orin Nano with the rootfs on the SD card. Then I’m plugging in the SSD and checking it as a second drive.

You could use lsblk command to check which disk is mounted as rootfs currently.

You could also refer to following thread to configure the sector_size/num_sectors and use -S parameter to specify the rootfs size in your case.
How to solve the issue that ssd are not entiely availiable after full disk encryption - #5 by KevinFFF

Flashing with the standard flashing script results in the same errors. I’m not sure if its a PCIE hardware issue on the carrier board or issue with this drive.

On the host.

Waiting for target to boot-up...
Waiting for target to boot-up...
Waiting for device to expose ssh ......RTNETLINK answers: File exists
RTNETLINK answers: File exists
Waiting for device to expose ssh ...Run command: flash on fc00:1:1:0::2
SSH ready
blockdev: cannot open /dev/mmcblk0boot0: No such file or directory
[ 0]: l4t_flash_from_kernel: Starting to create gpt for emmc
Active index file is /mnt/internal/flash.idx
Number of lines is 58
max_index=57
[ 2]: l4t_flash_from_kernel: Successfully create gpt for emmc
[ 2]: l4t_flash_from_kernel: Starting to create gpt for external device
Active index file is /mnt/external/flash.idx
Number of lines is 17
max_index=16
writing item=1, 9:0:primary_gpt, 512, 19968, gpt_primary_9_0.bin, 16896, fixed-<reserved>-0, d9743d54325caa45af437efd47712df8cc58870b
Writing primary_gpt partition with gpt_primary_9_0.bin
Offset is not aligned to K Bytes, no optimization is applied
dd if=/mnt/external/gpt_primary_9_0.bin of=/dev/nvme0n1 bs=1 skip=0  seek=512 count=16896
16896+0 records in
16896+0 records out
16896 bytes (17 kB, 16 KiB) copied, 0.0200363 s, 843 kB/s

On the Jetson Orin NX while flashing.

[   46.881881] nvme nvme0: frozen state error detected, reset controller
[   47.935596] pcieport 0004:00:00.0: AER: Root Port link has been reset
[   47.935828] pcieport 0004:00:00.0: AER: device recovery successful
[   47.935999] pcieport 0004:00:00.0: AER: Multiple Corrected error received: 0004:00:00.0
[   47.936238] pcieport 0004:00:00.0: AER: can't find device of ID0000
[   47.936422] pcieport 0004:00:00.0: AER: Uncorrected (Fatal) error received: 0004:00:00.0
[   47.936674] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   47.936979] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[   47.937216] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[   47.937395] nvme nvme0: frozen state error detected, reset controller
[   48.991597] pcieport 0004:00:00.0: AER: Root Port link has been reset
[   48.991811] pcieport 0004:00:00.0: AER: device recovery successful
[   48.991997] pcieport 0004:00:00.0: AER: Multiple Corrected error received: 0004:00:00.0
[   48.992239] pcieport 0004:00:00.0: AER: can't find device of ID0000
[   48.992419] pcieport 0004:00:00.0: AER: Uncorrected (Fatal) error received: 0004:00:00.0
[   48.992647] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   48.992948] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[   48.993186] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[   48.993373] nvme nvme0: frozen state error detected, reset controller
[   50.047598] pcieport 0004:00:00.0: AER: Root Port link has been reset
[   50.047827] pcieport 0004:00:00.0: AER: device recovery successful
[   50.048023] pcieport 0004:00:00.0: AER: Multiple Corrected error received: 0004:00:00.0
[   50.048255] pcieport 0004:00:00.0: AER: can't find device of ID0000
[   50.048435] pcieport 0004:00:00.0: AER: Corrected error received: 0004:00:00.0
[   50.048662] pcieport 0004:00:00.0: AER: can't find device of ID0000
[   50.048848] pcieport 0004:00:00.0: AER: Uncorrected (Fatal) error received: 0004:00:00.0
[   50.049079] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   50.049372] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[   50.049595] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[   50.050000] nvme nvme0: frozen state error detected, reset controller
[   51.103598] pcieport 0004:00:00.0: AER: Root Port link has been reset
[   51.103828] pcieport 0004:00:00.0: AER: device recovery successful
[   51.104005] pcieport 0004:00:00.0: AER: Corrected error received: 0004:00:00.0
[   51.104229] pcieport 0004:00:00.0: AER: can't find device of ID0000

Could you share the full command you used to flash the board and the full flash log as file here?

I’m seeing the same error on an Orin Nano with its OS running from SD Card. When I try to create a partition on the SSD from the Disks utility I see the same issue.

[  820.217954] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  820.227831] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  820.236482] pcieport 0004:00:00.0:    [ 0] RxErr
[  820.242813] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  820.253865] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  820.262465] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  820.269464] nvme nvme0: frozen state error detected, reset controller
[  821.337968] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  821.349008] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  821.357623] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  821.364648] nvme nvme0: frozen state error detected, reset controller
[  822.425915] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  822.436962] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  822.445568] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  822.452558] nvme nvme0: frozen state error detected, reset controller
[  823.545921] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  823.556995] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  823.565600] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  823.572594] nvme nvme0: frozen state error detected, reset controller

Then on reboot.

[   14.288878] pcieport 0001:00:00.0: AER: Root Port link has been reset
[   14.297299] pcieport 0001:00:00.0: AER: device recovery successful
[   14.315132] systemd[1]: Started Journal Service.
[   14.531956] nvgpu: 17000000.ga10b          nvgpu_nvhost_syncpt_init:135  [INFO]  syncpt_unit_base 60000000 syncpt_unit_size 4000000 size 10000
[   14.531956]
[   14.541683] pcieport 0001:00:00.0: AER: Multiple Corrected error received: 0001:00:00.0
[   14.554912] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[   14.564758] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[   14.576097] pcieport 0001:00:00.0:    [ 0] RxErr
[   14.585154] pcieport 0001:00:00.0: AER: Uncorrected (Fatal) error received: 0001:00:00.0
[   14.594943] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   14.605953] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000020/00400000
[   14.614550] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[   14.621531] pci 0001:01:00.0: AER: can't recover (no error_detected callback)
[   14.628895] usb 1-2: new high-speed USB device number 9 using tegra-xusb
[   14.796877] pcieport 0004:00:00.0: AER: Root Port link has been reset
[   14.806406] pcieport 0004:00:00.0: AER: device recovery successful
[   14.813188] pcieport 0004:00:00.0: AER: Corrected error received: 0004:00:00.0
[   14.820649] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[   14.830500] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[   14.839101] pcieport 0004:00:00.0:    [ 0] RxErr
[   14.845364] pcieport 0004:00:00.0: AER: Uncorrected (Fatal) error received: 0004:00:00.0
[   14.853687] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   14.860863] systemd-journald[266]: Received client request to flush runtime journal.
[   14.864700] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[   14.881229] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[   14.881237] pci 0004:01:00.0: AER: can't recover (no error_detected callback)
[   15.176912] usb 1-2: device descriptor read/64, error -71
[   15.424914] usb 1-2: device descriptor read/64, error -71
[   15.661609] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[   15.671544] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[   15.680149] pcieport 0001:00:00.0:    [ 0] RxErr
[   15.686899] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   15.697917] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000020/00400000
[   15.706511] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[   15.916997] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[   15.929720] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[   15.941087] pcieport 0004:00:00.0:    [ 0] RxErr
[   15.948466] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[   15.959474] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[   15.968068] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
0004:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a809 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 57
        Region 0: Memory at 2428000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=13 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme

Attempting to mount on the Orin Nano.

NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    16M  1 loop
mmcblk1      179:0    0  59.5G  0 disk
├─mmcblk1p1  179:1    0    58G  0 part /
├─mmcblk1p2  179:2    0   128M  0 part
├─mmcblk1p3  179:3    0   768K  0 part
├─mmcblk1p4  179:4    0  31.6M  0 part
├─mmcblk1p5  179:5    0   128M  0 part
├─mmcblk1p6  179:6    0   768K  0 part
├─mmcblk1p7  179:7    0  31.6M  0 part
├─mmcblk1p8  179:8    0    80M  0 part
├─mmcblk1p9  179:9    0   512K  0 part
├─mmcblk1p10 179:10   0    64M  0 part
├─mmcblk1p11 179:11   0    80M  0 part
├─mmcblk1p12 179:12   0   512K  0 part
├─mmcblk1p13 179:13   0    64M  0 part
└─mmcblk1p14 179:14   0 879.5M  0 part
zram0        251:0    0 611.4M  0 disk [SWAP]
zram1        251:1    0 611.4M  0 disk [SWAP]
zram2        251:2    0 611.4M  0 disk [SWAP]
zram3        251:3    0 611.4M  0 disk [SWAP]
zram4        251:4    0 611.4M  0 disk [SWAP]
zram5        251:5    0 611.4M  0 disk [SWAP]
nvme0n1      259:0    0   477G  0 disk
└─nvme0n1p1  259:1    0   477G  0 part
jetson@orin-nano:~$
jetson@orin-nano:~$
jetson@orin-nano:~$ cd /media/
jetson/ ssd/
jetson@orin-nano:~$ sudo mount -t ext4 /dev/nvm
nvmap      nvme0      nvme0n1    nvme0n1p1
jetson@orin-nano:~$ sudo mount -t ext4 /dev/nvme0n1p1 /media/ssd/
jetson@orin-nano:~$ [  194.420290] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  194.430144] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  194.438749] pcieport 0004:00:00.0:    [ 0] RxErr
[  194.445047] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
[  194.456153] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00004020/00400000
[  194.464753] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  194.471758] pcieport 0004:00:00.0:    [14] CmpltTO
[  194.478044] nvme nvme0: frozen state error detected, reset controller
[  194.484715] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  194.494561] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[  194.503159] pcieport 0001:00:00.0:    [ 0] RxErr
[  194.509454] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  194.520458] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000020/00400000
[  194.529053] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[  194.576465] blk_update_request: I/O error, dev nvme0n1, sector 268597696 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  195.610371] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  195.620276] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  195.628893] pcieport 0004:00:00.0:    [ 0] RxErr
[  196.227432] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  196.237404] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000001/0000e000
[  196.246005] pcieport 0004:00:00.0:    [ 0] RxErr
[  196.252705] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  196.262554] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000001/0000e000
[  196.271149] pcieport 0001:00:00.0:    [ 0] RxErr
[  196.277448] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Requester ID)
[  196.288558] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00004020/00400000
[  196.297152] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[  196.304134] pcieport 0001:00:00.0:    [14] CmpltTO
[  196.310444] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[  196.321465] pcieport 0004:00:00.0:   device [10de:229c] error status/mask=00000020/00400000
[  196.330060] pcieport 0004:00:00.0:    [ 5] SDES                   (First)
[  196.337042] nvme nvme0: frozen state error detected, reset controller
[  197.378344] pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)

Have you formatted it as ext4 before use?

Have you tried with other NVMe SSD with the same behavior?

Most likely a hardware design problem if this ssd can work fine on NV devkit.

I would suggest you give up booting from that SSD for now. Boot it from other interface first and focus on fixing that error report from your PCIe C4 + C1 controller first.

Yes I’ll focus on that. I’ll open a new issue to address it.

I just flashed the NVME successfully on my carrier board with an Orin NX and Western Digital SDBPMPZ-256G.

Using this command.

 sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 \
  -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \
  --showlogs --network usb0 jetson-orin-nano-devkit internal

flash-log.txt (40.7 KB)

0004:01:00.0 Non-Volatile memory controller: Sandisk Corp Device 5008 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Sandisk Corp Device 5008
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 57
        Region 0: Memory at 2428000000 (64-bit, non-prefetchable) [size=16K]
        Region 4: Memory at 2428004000 (64-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [b0] MSI-X: Enable+ Count=17 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=4 offset=00000000
        Capabilities: [c0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <8us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [1b8 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [300 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [900 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
                          PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme
jetson@jetson-nano:~$ sudo hdparm -Ttv /dev/nvme0n1p1

/dev/nvme0n1p1:
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 readonly      =  0 (off)
 readahead     = 256 (on)
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 geometry      = 243416/64/32, sectors = 498515968, start = 1601320
 Timing cached reads:   5810 MB in  2.00 seconds = 2907.57 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 4024 MB in  3.00 seconds = 1341.11 MB/sec
jetson@jetson-nano:~$

I let it run for a while and it hung installing jetpack. Now its in a boot loop throwing this error again.

ASSERT [NvmExpressDxe] /out/nvidia/bootloader/uefi/Jetson_RELEASE/edk2/MdeModulePkg/Bus/Pci/NvmExpressDxe/NvmExpressHci.c(772): (Private->Cap.Mpsmin + 12) <= 12

Resetting the system in 5 seconds.

Do you mean that it could work before but hit this assertion issue occasionally?

I think I had discovered that the root cause was an inadequately size 3.3V regulator on the carrier.

0004:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a809 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 57
        Region 0: Memory at 2428000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=13 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.