We are using the following hardware setup:
- TX2
- Elroy Carrier Board from Connect Tech Industries: http://connecttech.com/product/asg002-elroy-carrier-for-nvidia-jetson-tx2-tx1/
- MiniPCIe to Quad USB3.0 adapter from Innodisk: http://www.dpie.com/mini-pcie/mini-pci-express-usb-modules/innodisk-empu-3401
95% of the time everything works as expected. However, occasionally, after a power cycle the pcie-controller fails to find any end points. In this scenario the only resolution is a hard power cycle of the board. For development this isn’t a major concern, however, we are developing a product which will be installed in remote locations, making this a serious issue. We are looking either for a way to kill power and bring it back or even better, a way to prevent the failure in the first case.
Any help is greatly appreciated.
The details:
Failed system:
dmesg | grep pci
[ 0.236038] iommu: Adding device 10003000.pcie-controller to group 50
[ 6.668552] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[ 6.683442] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[ 6.692225] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[ 6.693840] tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes
[ 7.171963] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 7.637430] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 8.093036] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[ 8.100608] tegra-pcie 10003000.pcie-controller: link 0 down, ignoring
[ 8.511641] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 8.931451] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 9.353417] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 9.361426] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[ 9.367970] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
sudo lspci -vv -> No output
echo 1 > /sys/bus/pci/rescan -> No output, no change to detected cards
dmesg | grep dts
[ 0.045342] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[ 0.157402] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[ 0.215559] tegra-pmc c360000.pmc: scratch reg offset dts data not present
After power cycle (working system):
dmesg | grep pci
[ 0.236166] iommu: Adding device 10003000.pcie-controller to group 50
[ 6.499507] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[ 6.525286] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[ 6.534578] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[ 6.546824] tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes
[ 7.001881] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 7.433082] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 7.851464] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 7.861097] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[ 7.870112] tegra-pcie 10003000.pcie-controller: PCI host bridge to bus 0000:00
[ 7.870116] pci_bus 0000:00: root bus resource [mem 0x50100000-0x57ffffff]
[ 7.870118] pci_bus 0000:00: root bus resource [mem 0x58000000-0x7fffffff pref]
[ 7.870123] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 7.870125] pci_bus 0000:00: root bus resource [io 0x1000-0xffff]
[ 7.870148] pci 0000:00:01.0: [10de:10e5] type 01 class 0x060400
[ 7.870236] pci 0000:00:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 7.870464] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 7.870675] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
[ 7.870809] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[ 7.870978] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[ 7.877445] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 7.877522] pci 0000:00:01.0: BAR 8: assigned [mem 0x50100000-0x501fffff]
[ 7.877526] pci 0000:01:00.0: BAR 0: assigned [mem 0x50100000-0x50101fff 64bit]
[ 7.877593] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 7.877599] pci 0000:00:01.0: bridge window [mem 0x50100000-0x501fffff]
[ 7.877674] pcieport 0000:00:01.0: enabling device (0000 -> 0002)
[ 7.877769] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[ 7.877771] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[ 7.877776] pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
[ 7.877848] aer 0000:00:01.0:pcie02: service driver aer loaded
[ 7.877970] pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 7.889068] tegra-pcie 10003000.pcie-controller: speed change : Gen-1 -> Gen-2
sudo lspci -vv
00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 388
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: 50100000-501fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
Mapping Address Base: 00000000fee00000
Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag+ RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: pcieport
01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 128 bytes
Interrupt: pin A routed to IRQ 388
Region 0: Memory at 50100000 (64-bit, non-prefetchable)
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00001000
PBA: BAR=0 offset=00001080
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [150 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: xhci_hcd
dmesg | grep dts
[ 0.045327] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[ 0.157428] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[ 0.215592] tegra-pmc c360000.pmc: scratch reg offset dts data not present