2 external PCIEx1 Ethernet cannot be recognized when use a 4TB SSD as the system disk

My evironment is:
hardware: Orin NX 8GB + customized carried board
software: rel 35.4.1

This is my external device framework:

Two external ethernet cards work fine, when use 2TB SSD as the system disk.
So my hardware and software config is correct.

here is the log:
2TB_PCIE_normal.log (78.3 KB)

But cannot be recognized when use a 4TB SSD instead.

here is the log:
4TB_PCIE_error.log (75.2 KB)

I located the following error log, but can’t find the reason in forums and network:

[   10.665272] pci_bus 0007:01: busn_res: [bus 01-ff] is released
[   10.671388] pci 0007:00:00.0: Removing from iommu group 9
[   10.676952] pci_bus 0007:00: busn_res: [bus 00-ff] is released
[   11.937710] pci_bus 0009:01: busn_res: [bus 01-ff] is released
[   11.943778] pci 0009:00:00.0: Removing from iommu group 11
[   11.949429] pci_bus 0009:00: busn_res: [bus 00-ff] is released

Please help me check this problem, thanks!

These are the parameters of two SSD:

4TB, M.2-2280, NVMe(PCIE4.0X4),3DTLC,Kingston

2TB, M.2 2280, NVME(PCIe Gen4 x4),M-Key,Kingston

Hi Xzz,

Could you reproduce the same issue on the devkit or this issue is specific to your custom carrier board?

Do you flash the 4TB SSD after connecting it on your custom carrier board?

Have you also tried with other SSD? Or the issue is specific to this 4TB SSD?

I can’t reproduce this problem on the devkit, because these two ethernet IC hardware is on my customized board.

Yes, I first connected 4TB SSD(NVME) on my board ,and then flash system into this 4TB SSD.

I have to ask my customer, please wait for my reply, thanks!

Hi, KevinFFF,
We have try another 4TB SSD, and the same error happened:

Do you flash this 4TB SSD successfully?

Is this 4TB SSD working on the devkit?

Yes, system flash success, and running normal, except two external PICE ethernet.

It is works fine in the devkit.

Make sure your pinmux and device tree are still the correct one when you change to use different ssd to boot.

If they are same, try the debug tips of pcie debug mentioned in L4T developer guide.

Please try a PCIe Gen 3 SSD - yes, the Gen 3 is important. I’ve got a suspicion…

lspci -vvv -s (bus):(slot):(function) will tell you if there is actually a Gen 3 connection or a Gen 4 connection.

Quickfix: Limit all PCIe busses to PCIe Gen 3 in the device tree.

Hi ,WayneWWW,I have tested more then 6 brands of SSD ,which can recognize external network normally, so I guess my config is correct.

Thanks! I try to limit the speed to Gen3, but the problem has not been fixed.
log is here:
LnkSta: Speed 8GT/s (downgraded), Width x4 (ok)

root@tegra-ubuntu:~# lspci -vvv -s 0004:01:00.0
0004:01:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5019 (prog-if 02 [NVM Express])
	Subsystem: Kingston Technology Company, Inc. Device 5019
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 57
	Region 0: Memory at 2428000000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (downgraded), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [d0] MSI-X: Enable+ Count=17 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [e0] MSI: Enable- Count=1/16 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [100 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=300us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [1e0 v1] Data Link Feature <?>
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [300 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: 0
	Capabilities: [340 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [378 v1] Lane Margining at the Receiver <?>
	Kernel driver in use: nvme

Do you mean that the issue is specific to the custom carrier board?
Can you use 4T SSD and a PCIe Ethernet on the devkit at the same time?

Have you referred to Debug PCIe Link-Up Failure?

Thank you for help! we find this is our hardware mistake.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.