R24.2 (TX1) Kernel hangs during bootup on PCIe (PLX Switch) - R23.2 was ok.

Hi all,

We have a custom PCIe switch based on PEX8613. It has been working ok with R23.2.0 with TX1 on host port of the PCIe switch. However, with R24.1 and R24.2, kernel hangs during bootup. uboot is not affected. uboot shows following for “pci” command:

<b>Tegra210 (P2371-2180) # pci</b>
Scanning PCI devices on bus 0
BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
_____________________________________________________________
00.01.00   0x10de     0x0fae     Bridge device           0x04

R23 use to see PEX8613 as follows:

<b>ubuntu@tegra-ubuntu:~$ lspci</b>
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:01.0 Bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
02:02.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)

Occasional pci device not vga error on R23:

ubuntu@tegra-ubuntu:~$ [   29.294637] pgd = ffffffc0f5260000
[   29.298036] [0000000c] *pgd=000000015b6f7003, *pmd=000000015bf95003, *pte=0000000000000000
[   29.306472] Library at 0xf26e1e44: 0xf26c3000 /usr/lib/arm-linux-gnueabihf/qt5/plugins/platforms/libqxcb.so
[   29.316211] Library at 0x2: 0x8000 /usr/bin/signon-ui
[   29.321254] vdso base = 0xf72ab000
[  325.476343] <b>vgaarb: this pci device is not a vga device</b>

Could it be that TX1’s video driver keeps hitting PCIe and doesn’t recognize bridge type device. And in R24 its really broken and causing failures almost 100% of the time?

I will attach full boot logs for R23 and R24.

One last thing, on occasion when R24 does boot up, it doesn’t last for very long. it crashes as soon as lscpi is executed or in most instances, just starts dumping registers and causes CPU0 to hardlock.
TX1-R24-2-PCIe-Failure.txt (4.55 KB)
TX1-R24-2-PCIe-OK-WithErrors.txt (3.68 KB)

Is that PCIe switch something which can be plugged in to a different desktop PCIe slot, or is it hard wired into a custom board? If it can be plugged into another Linux desktop computer, then the output of “sudo lspci -vvv” for that device might be useful (since it causes hang in the Jetson you can’t do it from Jetson, but the PCIe information from another computer would still be good for showing the device features). Also, are you using a custom driver?

Thats a good point. It does work with linux PC. Will attach that output here shortly.

thank you.

@linuxdev,

while i arrange to borrow additional bridge card for PC, here is the lspci -vvv
output from TX1 with bridge card running R23.2. I’ll examine and post lscpi output from PC ASAP.

00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: 13000000-131fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
                Mapping Address Base: 00000000fee00000
        Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr+ UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v0] #00
        Capabilities: [140 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=30us PortTPowerOnTime=70us
        Kernel driver in use: pcieport

01:00.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) (prog-if 00 [Nor)
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Region 0: Memory at 13100000 (32-bit, non-prefetchable) [disabled] 
        Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: fff00000-000fffff
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/4 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [68] Express (v2) Upstream Port, MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr+ UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch
        Capabilities: [100 v1] Device Serial Number ba-86-01-10-b5-df-0e-00
        Capabilities: [fb4 v1] Advanced Error Reporting
                UESta:  DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 04, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [138 v1] Power Budgeting <?>
        Capabilities: [148 v1] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=4
                Arb:    Fixed+ WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=06 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
                        Port Arbitration Table <?>
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable- ID=1 ArbSelect=Fixed TC/VC=00
                        Status: NegoPending+ InProgress-
        Capabilities: [448 v1] Vendor Specific Information: ID=0000 Rev=0 Len=0cc <?>
        Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
        Kernel driver in use: pcieport

02:01.0 Bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ff) (prog-if ff)
        !!! Unknown header type 7f

02:02.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: pcieport

Under “01:00.0 PCI bridge: PLX Technology” this draws my curiosity:

AERCap: First Error Pointer: 04, GenCap+ CGenEn- ChkCap+ ChkEn-

I’m not 100% positive about the meaning of this, but if there were no PHY layer error, I believe this would be NULL or 0. I also cannot verify that this was a correctable error, but I’m thinking this was in fact correctable. If the issue were more severe it would have backed off to gen. 1 speeds, yet the bridge runs at full gen. 2 speeds without any other noteworthy issues. I’m thinking the signal quality is right at the edge of functioning versus becoming non-functional (if something were to make the signal slightly lower quality this might fail).

Also, whether or not DMA is involved might have a bearing on it when comparing errors which do not occur under 32-bit L4T versus errors which start only under 64-bit L4T. I recall seeing a requirement in the driver translating DMA offsets for 64-bit differently than for 32-bit (I do not know the details of that). Is any kind of DMA involved?

Thanks @linuxdev. There shouldn’t be any DMA at this stage since there aren’t even default drivers for this bridge in L4T.

Also, just got another bridge card for the Linux Box (AMD machine with 64bit UBUNTU 14.04). Here are the lscpi -vvv output from this Linux Box:

Other PCIe Endpoint Descriptions Removed

01:00.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Region 0: Memory at ff700000 (32-bit, non-prefetchable) 
	Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: ff600000-ff6fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000001  Pending: 00000000
	Capabilities: [68] Express (v2) Upstream Port, MSI 00
		DevCap:	MaxPayload 2048 bytes, PhantFunc 0
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 512 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch
	Capabilities: [100 v1] Device Serial Number ba-86-01-10-b5-df-0e-00
	Capabilities: [fb4 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover+ Timeout+ NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [138 v1] Power Budgeting <?>
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=1 RefClk=100ns PATEntryBits=4
		Arb:	Fixed+ WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=06 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32+ WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=WRR32 TC/VC=01
			Status:	NegoPending- InProgress-
			Port Arbitration Table <?>
		VC1:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable- ID=1 ArbSelect=Fixed TC/VC=00
			Status:	NegoPending+ InProgress-
	Capabilities: [448 v1] Vendor Specific Information: ID=0000 Rev=0 Len=0cc <?>
	Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
	Kernel driver in use: pcieport

02:01.0 Bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba)
	Subsystem: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at ff600000 (32-bit, non-prefetchable) 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable- Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [68] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 2048 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 512 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch
	Capabilities: [c8] Vendor Specific Information: Len=38 <?>
	Capabilities: [100 v1] Device Serial Number ba-86-01-10-b5-df-0e-00
	Capabilities: [fb4 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=1 RefClk=100ns PATEntryBits=1
		Arb:	Fixed+ WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
		VC1:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable- ID=1 ArbSelect=Fixed TC/VC=00
			Status:	NegoPending- InProgress-
	Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
	Capabilities: [c34 v1] Vendor Specific Information: ID=0003 Rev=0 Len=078 <?>

02:02.0 PCI bridge: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ba) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: fff00000-000fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable+ Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000001  Pending: 00000000
	Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
		DevCap:	MaxPayload 2048 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 512 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
			ClockPM- Surprise+ LLActRep+ BwNot+
		LnkCtl:	ASPM Disabled; Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #2, PowerLimit 25.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd Off, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a4] Subsystem: PLX Technology, Inc. PEX 8613 12-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch
	Capabilities: [100 v1] Device Serial Number ba-86-01-10-b5-df-0e-00
	Capabilities: [fb4 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=1 RefClk=100ns PATEntryBits=1
		Arb:	Fixed+ WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending+ InProgress-
		VC1:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable- ID=1 ArbSelect=Fixed TC/VC=00
			Status:	NegoPending+ InProgress-
	Capabilities: [520 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [950 v1] Vendor Specific Information: ID=0001 Rev=0 Len=010 <?>
	Kernel driver in use: pcieport

Other PCIe Endpoint Descriptions Removed

I see three listings for the bridge on this latest lspci.

The second and third listings show capability to run at gen. 2 speeds, but have had to cut back to gen. 1 speeds…apparently somewhere in the system signal quality was insufficient to run at gen. 2:

LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

Again, I am not positive if this means a correctable error was found (I see “0” or NULL on any other cards I’ve looked at), but all three show this:

AERCap:	First Error Pointer: 1f, GenCap+ CGenEn- ChkCap+ ChkEn-

Can anyone verify if “First Error Pointer” having a non-zero value implies AER found and corrected an error?

Obviously there could be more going on than signal quality since things work on R23.2 and on the desktop, but signal quality seems to be at least part of the issue. Do note that if signal quality is part of the issue, then devices attached can influence results. What I’m wondering is if on R24.2 there might be a way to force generation 1 speeds and see R24.2 starts working when backing off on transfer rates (I am unsure of how to force gen. 1). If the case is that gen. 2 testing is sufficiently bad it might never get the chance to back off to gen. 1.

I would guess that the reason the first listing on x86_64 did not back off to gen. 1 might be because of its physical location being closer to the root. Possibly you could verify location being closer to root as helping by swapping the gen. 2 speed card with one of the gen. 1 speed cards and see if the same slot which stayed at gen. 2 previously still stays at gen. 2, versus following the card.

@linuxdev,

you are correct. clearly the host side link tends to get steady at Gen2 and thats the link to TX1 (only on R23.2) or x86. Other two ports on the switch tend to pare down to Gen1 x0 on x86 and completely fall apart on TX1. I do want to say that there are no devices on down ports at the moment. I will need to source some gender changer and attach a known PCI card on one of the down ports to see if that helps setup a down port at Gen2 x4

If you have a gen. 1 card and a gen. 2 card I’d suggest trying both. Gen. 1 to force lower speeds and see what happens.

It isn’t easy to disable spread spectrum on the Jetson, but on the x86_64 I’d suggest seeing if spread spectrum is enabled in BIOS or not. See how things change with and without SS since this should be relatively easy on a desktop motherboard.

on x86 the our bridge works very well. i.e. its stable, it doesn’t die just from doing lspci -vvv
:) also like you said disabling of SSC is in BIOS.

on TX1 its a different story. TX1 freezes and locks up all cores even under r23.2. I also don’t know what registers are available with what bits so to disable SSC i am doing trial and error based on someones work on TK1 :) Perhaps, there is a proper datasheet for the TX1 SOC (not the abstracted SOM datasheet which has no details on internal registers). If such a beast exists, please drop me a hint.

I do not know of more specific documents. It does look like signal related errors, but unless you can connect a high end scope and view the eye pattern, I don’t know what to test next. Disabling spread spectrum is always helpful, but this is not a simple feature to control on systems without a BIOS.