Jetson Orin NX, custom board, add M.2 on pcie2, pcieport error

Hello,

  • L4T 35.4.1
  • devicetree tegra234-p3767-0000-p3768-0000-a0.dtb
    • tegra234-mb2-bct-scr-p3767-0000.dts patched (Jetson_Linux_Release_Notes_r35.4.1.pdf 4.2.3) and tegra234-mb2-bct-misc-p3767-0000.dts patched (to disable EEPROM)
  • default pinmux
  • odm gbe-uphy-config-8,hsstp-lane-map-3,hsio-uphy-config-0
  • kernel 5.10 patched (from meta-tegra mickledore)

We would like to know if it is possible, based on p3767-0000-p3768-0000 carier board design, to add a M.2 port on the pcie2 to make a custom carrier board to support Orin NX 16G?

Because we have made a prototype, and we have got some issues with the pcieport 0001:00:00.0 @14100000.

kernel boot log:

[…]
[    4.108650] tegra194-pcie 14100000.pcie: Adding to iommu group 7
[    4.121163] tegra194-pcie 14100000.pcie: Using GICv2m MSI allocator
[    4.122172] tegra194-pcie 14160000.pcie: Adding to iommu group 8
[    4.328510] tegra194-pcie 14160000.pcie: Using GICv2m MSI allocator
[    4.133510] tegra194-pcie 141e0000.pcie: Adding to iommu group 9
[    4.144811] tegra194-pcie 141e0000.pcie: Using GICv2m MSI allocator
[    4.145341] tegra194-pcie 140a0000.pcie: Adding to iommu group 10
[    4.157356] tegra194-pcie 140a0000.pcie: Using GICv2m MSI allocator
[…]
[    5.187627] tegra194-pcie 14100000.pcie: Using GICv2m MSI allocator
[    5.194287] tegra194-pcie 14100000.pcie: host bridge /pcie@14100000 ranges:
[    5.199710] tegra194-pcie 14100000.pcie:       IO 0x0030100000..0x00301fffff -> 0x0030100000
[    5.208193] tegra194-pcie 14100000.pcie:      MEM 0x20a8000000..0x20afffffff -> 0x0040000000
[    5.216769] tegra194-pcie 14100000.pcie:      MEM 0x2080000000..0x20a7ffffff -> 0x2080000000
[    5.333570] tegra194-pcie 14100000.pcie: Link up
[    5.334791] tegra194-pcie 14100000.pcie: PCI host bridge to bus 0001:00
[    5.334974] pci_bus 0001:00: root bus resource [bus 00-ff]
[    5.335118] pci_bus 0001:00: root bus resource [io  0x0000-0xfffff] (bus address [0x30100000-0x301fffff])
[    5.335365] pci_bus 0001:00: root bus resource [mem 0x20a8000000-0x20afffffff] (bus address [0x40000000-0x47ffffff])
[    5.335645] pci_bus 0001:00: root bus resourca [mem 0x2080000000-0x20a7ffffff pref]
[    5.335907] pci 0001:00:00.0: [10de:229e] type 01 class 0x060400
[    5.336241] pci 0001:00:00.0: PME# supported from D0 D3hot
[    5.340418] pci 0001:01:00.0: [1055:7430] type 00 class 0x020000
[    5.340940] pci 0001:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[    5.341337] pci 0001:01:00.0: reg 0x18: [mem 0x00000000-0x000000ff 64bit]
[    5.341722] pci 0001:01:00.0: reg 0x20: [mem 0x00000000-0x000000ff 64bit]
[    5.344380] pci 0001:01:00.0: PME# supported from D0 D3hot
[    5.348473] pci 0001:00:00.0: BAR 14: assigned [mem 0x20a8000000-0x20a80fffff]
[    5.348660] pci 0001:01:00.0: BAR 0: assigned [mem 0x20a8000000-0x20a8001fff 64bit]
[    5.349031] pci 0001:01:00.0: BAR 2: assigned [mem 0x20a8002000-0x20a80020ff 64bit]
[    5.353031] pci 0001:01:00.0: BAR 4: assigned [mem 0x20a8002100-0x20a80021ff 64bit]
[    5.360712] pci 0001:00:00.0: PCI bridge to [bus 01-ff]
[    5.365788] pci 0001:00:00.0:   bridge window [mem 0x20a8000000-0x20a80fffff]
[    5.373149] pci 0001:00:00.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  512
[    5.381973] pci 0001:01:00.0: Max Payload Size set to  256/ 512 (was  128), Max Read Rq  512
[    5.390699] pcieport 0001:00:00.0: Adding to iommu group 7
[    5.396181] pcieport 0001:00:00.0: PME: Signaling with IRQ 55
[    5.402460] pcieport 0001:00:00.0: AER: enabled with IRQ 55
[    5.402475] pcieport 0001:00:00.0: AER: Multiple Corrected error received: 0001:00:00.0
[    5.415418] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[    5.424947] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000081/0000e000
[    5.433167] pcieport 0001:00:00.0:    [ 0] RxErr
[    5.439292] pcieport 0001:00:00.0:    [ 7] BadDLLP
[    5.445615] pcieport 0001:00:00.0: AER: Multiple Corrected error received: 0001:00:00.0
[    5.453560] pcieport 0001:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[    5.463094] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000081/0000e000
[    5.471516] pcieport 0001:00:00.0:    [ 0] RxErr
[    5.477617] pcieport 0001:00:00.0:    [ 7] BadDLLP
[…] #infinite loop here

Other pcie seem to have correct boot log.

Sometimes the boot allows a shell session after this kernel crash:

[   14.263474] ------------[ cut here ]------------
[   14.263637] WARNING: CPU: 2 PID: 114 at drivers/net/phy/phy.c:963 phy_error+0x1c/0x64
[   14.263640] Module[ l2m  OK  d ] Finished ) brcmutil(O) cfg80211 � us�net lpace_aterface to 100Mb/s_hdmi m.pat(O)
 tegra_bpmp_thermal snd_hda_tegra snd_hda_codec snd_hda_core spi_tegra114 lan743x r8168 pwm_fan nvidia_drm(O) nvidia_modeset(O) nvidia(O) nvgpu nvmap ina3221
[   14.264595] CPU: 2 PID: 114 Comm: kworker/u16:1 Tainted: G           O      5.10.120-l4t-r35.4.ga+g76678311c10b #1
[   14.264881] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin NX Developer Kit, BIOS v35.4.1 10/16/2023
[   14.265183] Workqueue: events_power_efficient phy_state_machine
[   14.265340] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
[   14.265502] pc : phy_error+0x1c/0x64
[   14.265843] lr : phy_state_machine+0xa8/0x264
[   14.266502] sp : ffff8000116f3d20
[   14.267010] x29: ffff8000116f3d20 x28: ffff1bab81c89600 
[   14.267827] x27: ffff1bab80142470 x26: 00000000fffffef7 
[   14.268646] x25: 0000000000000000 x24: ffff1bab884d54a8 
[   14.270226] x23: 00000000ffffff92 x22: ffff1bab884d54a0 
[   14.275739] x21: ffff1bab884d54f8 x20: 0000000000000003 
[   14.281252] x19: ffff1bab884d5000 x18: 0000000000000000 
[   14.286677] x17: 0000000000000000 x16: ffffd898c8e2be30 
[   14.292189] x15: 0000000000000000 x14: 0000000000000000 
[   14.297613] x13: 0000000000000000 x12: 0000000000000000 
[   14.303038] x11: 0000000000000000 x10: bf13ea9f804ed36e 
[   14.308552] x9 : ffffd898c9772338 x8 : ffffd898ca04a208 
[   14.313975] x7 : ffffd898ca04a240 x6 : 000000003d0de856 
[   14.319401] x5 : 00ffffffffffffff x4 : ffff1bab81c96900 
[   14.324827] x3 : ffff1bab884d54f8 x2 : 0000000000000000 
[   14.330164] x1 : ffff1bab81c96900 x0 : ffff1bab884d5000 
[   14.335503] Call trace:
[   14.337954]  phy_error+0x1c/0x64
[   14.341366]  phy_state_machine+0xa8/0x264
[   14.345396]  process_one_work+0x1fc/0x4bc
[   14.349416]  worker_thread+0x7c/0x460
[   14.352915]  kthread+0x160/0x16c
[   14.356331]  ret_from_fork+0x10/0x38
[   14.359827] ---[ end trace 76f27c4ebfcece07 ]---

and pcie errors become:

[   15.437998] pcieport 0001:00:00.0: AER: Root Port link has been reset
[   15.540596] pcieport 0001:00:00.0: AER: device recovery successful
[   15.540784] pcieport 0001:00:00.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0001:00:00.0
[   15.541072] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[   15.541398] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000020/00400000
[   15.541645] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[   15.541839] lan743x 0001:01:00.0: AER: can't recover (no error_detected callback)
[   15.542051] pcieport 0001:00:00.0: AER: device recovery failed
[   15.542205] pcieport 0001:00:00.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0001:00:00.0
[   15.542465] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[   15.543767] pcieport 0001:00:00.0:   device [10de:229e] error status/mask=00000020/00400000
[   15.545039] pcieport 0001:00:00.0:    [ 5] SDES                   (First)
[   15.546026] lan743x 0001:01:00.0: AER: can't recover (no error_detected callback)
[   15.552307] pcieport 0001:00:00.0: AER: device recovery failed
[…] #infinite loop here

This allows to get this information from the shell:

sudo lspci -vvv
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 55
	IOMMU group: 7
	Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
	I/O behind bridge: f000-0fff [disabled] [16-bit]
	Memory behind bridge: a8000000-a80fffff [size=1M] [32-bit]
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled] [64-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr+ FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x1, ASPM not supported
			ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		RootCap: CRSVisible+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP+ LTR+
			 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: Downstream Port
	Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
		Vector table: BAR=0 offset=00000000
		PBA: BAR=0 offset=00000000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES+ TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 05, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
		RootCmd: CERptEn+ NFERptEn+ FERptEn+
		RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
			 FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
		ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
	Capabilities: [148 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [158 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [17c v1] Lane Margining at the Receiver <?>
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
			  PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=10us
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [1a0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
	Capabilities: [2a0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
	Capabilities: [2d8 v1] Data Link Feature <?>
	Capabilities: [2e4 v1] Precision Time Measurement
		PTMCap: Requester:- Responder:+ Root:+
		PTMClockGranularity: 16ns
		PTMControl: Enabled:- RootSelected:-
		PTMEffectiveGranularity: Unknown
	Capabilities: [2f0 v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
	Capabilities: [358 v1] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
	Kernel driver in use: pcieport

0001:01:00.0 Ethernet controller: Microchip Technology / SMSC LAN7430 (rev 11)
	Subsystem: Microchip Technology / SMSC LAN7430
	!!! Unknown header type 7f
	Interrupt: pin ? routed to IRQ 55
	IOMMU group: 7
	Region 0: Memory at 20a8000000 (64-bit, non-prefetchable) [size=8K]
	Region 2: Memory at 20a8002000 (64-bit, non-prefetchable) [size=256]
	Region 4: Memory at 20a8002100 (64-bit, non-prefetchable) [size=256]
	Kernel driver in use: lan743x
	Kernel modules: lan743x

the 0001:01:00.0 Ethernet controller’s information seems uncorrect. Other pcie seem to obtain correct data.

Could you help us to solve this issue? Thank you in advance.

C1 is already enabled as M.2 key E on Orin Nano devkit carrier board. Could you put same ethernet card on that slot and see if you would still see same issue?

I don’t think it is configuration issue. Actually you don’t need any configuration at all as default BSP already enabled it.