USB 3.0 MiniPCIe not detected

We are using the following hardware setup:

95% of the time everything works as expected. However, occasionally, after a power cycle the pcie-controller fails to find any end points. In this scenario the only resolution is a hard power cycle of the board. For development this isn’t a major concern, however, we are developing a product which will be installed in remote locations, making this a serious issue. We are looking either for a way to kill power and bring it back or even better, a way to prevent the failure in the first case.

Any help is greatly appreciated.

The details:

Failed system:

dmesg | grep pci
[    0.236038] iommu: Adding device 10003000.pcie-controller to group 50
[    6.668552] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[    6.683442] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[    6.692225] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[    6.693840] tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes
[    7.171963] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    7.637430] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    8.093036] tegra-pcie 10003000.pcie-controller: link 0 down, retrying
[    8.100608] tegra-pcie 10003000.pcie-controller: link 0 down, ignoring
[    8.511641] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    8.931451] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    9.353417] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    9.361426] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[    9.367970] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
sudo lspci -vv -> No output
echo 1 > /sys/bus/pci/rescan -> No output, no change to detected cards
dmesg | grep dts
[    0.045342] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[    0.157402] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[    0.215559] tegra-pmc c360000.pmc: scratch reg offset dts data not present

After power cycle (working system):

dmesg | grep pci
[    0.236166] iommu: Adding device 10003000.pcie-controller to group 50
[    6.499507] tegra-pcie 10003000.pcie-controller: wrong configuration updated in DT, switching to default 2x1, 1x1, 1x1 configuration
[    6.525286] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[    6.534578] tegra-pcie 10003000.pcie-controller: probing port 0, using 2 lanes
[    6.546824] tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes
[    7.001881] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    7.433082] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    7.851464] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[    7.861097] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[    7.870112] tegra-pcie 10003000.pcie-controller: PCI host bridge to bus 0000:00
[    7.870116] pci_bus 0000:00: root bus resource [mem 0x50100000-0x57ffffff]
[    7.870118] pci_bus 0000:00: root bus resource [mem 0x58000000-0x7fffffff pref]
[    7.870123] pci_bus 0000:00: root bus resource [bus 00-ff]
[    7.870125] pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
[    7.870148] pci 0000:00:01.0: [10de:10e5] type 01 class 0x060400
[    7.870236] pci 0000:00:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[    7.870464] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    7.870675] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
[    7.870809] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[    7.870978] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    7.877445] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    7.877522] pci 0000:00:01.0: BAR 8: assigned [mem 0x50100000-0x501fffff]
[    7.877526] pci 0000:01:00.0: BAR 0: assigned [mem 0x50100000-0x50101fff 64bit]
[    7.877593] pci 0000:00:01.0: PCI bridge to [bus 01]
[    7.877599] pci 0000:00:01.0:   bridge window [mem 0x50100000-0x501fffff]
[    7.877674] pcieport 0000:00:01.0: enabling device (0000 -> 0002)
[    7.877769] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[    7.877771] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[    7.877776] pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
[    7.877848] aer 0000:00:01.0:pcie02: service driver aer loaded
[    7.877970] pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.889068] tegra-pcie 10003000.pcie-controller: speed change : Gen-1 -> Gen-2
sudo lspci -vv
00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 388
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Memory behind bridge: 50100000-501fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: pcieport

01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 128 bytes
	Interrupt: pin A routed to IRQ 388
	Region 0: Memory at 50100000 (64-bit, non-prefetchable) 
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00001000
		PBA: BAR=0 offset=00001080
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [150 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Kernel driver in use: xhci_hcd
dmesg | grep dts
[    0.045327] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[    0.157428] DTS File Name: tegra186-tx2-cti-ASG002-USB3.dts
[    0.215592] tegra-pmc c360000.pmc: scratch reg offset dts data not present

Hello,

I will be able to help with your issue, please email support@connecttech.com for further assistance.

Carter - Connect Tech Technical Support

I could think of couple of reasons for this behavior
→ It is possible that the USB3.0 card/setup needs some time after it is powered up to start communicating with Tegra and in the current setup, may be that time is not given. Can you try making PCIe host controller driver as a module (pci-tegra.ko) and try to insmod it at a later time and see if the issue is still reproducible?
→ It is possible that it needs more wait time for the link to come up. Can you try with the following patch and see if it makes any difference?

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index 35d63342cabd..a790476bee6f 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -2009,6 +2009,7 @@ static void tegra_pcie_port_reset(struct tegra_pcie_port *port)
         value |= AFI_PEX_CTRL_RST;
         afi_writel(port->pcie, value, ctrl);
     }
+    msleep(100);
 }

 static void tegra_pcie_port_enable(struct tegra_pcie_port *port)
@@ -2142,7 +2143,7 @@ EXPORT_SYMBOL(tegra_pcie_port_disable_per_pdev);
  * can result in the increase of the bootup time as there are big timeout
  * loops.
  */
-#define TEGRA_PCIE_LINKUP_TIMEOUT   200 /* up to 1.2 seconds */
+#define TEGRA_PCIE_LINKUP_TIMEOUT   500 /* up to 1.2 seconds */
 static bool tegra_pcie_port_check_link(struct tegra_pcie_port *port)
 {
     struct device *dev = port->pcie->dev;

→ also, please try both making PCIe host controller driver as a module and also the above patch

Thanks for the comment vidyas. I’ve been in touch with Connect Tech support directly, seems the issue may be specific to the carrier board and they are resolving it. If it still persists we will look into the custom PCIe driver.

Cheers,
Ian

Hi vidyas,

Could you please be more explicit with your fix.

  1. “making PCIe host controller driver as a module (pci-tegra.ko) and try to insmod it at a later time”

Could you suggest a link that tells me how to do this?

  1. Can you try with the following patch and see if it makes any difference?

I can’t find the file “pci-tegra.c” I can only find “64_TX2/Linux_for_Tegra/rootfs/usr/src/linux-headers-4.4.38-tegra/include/linux/pci-tegra.h” and “64_TX2/Linux_for_Tegra/rootfs/lib/modules/4.4.38-tegra/kernel/drivers/pci/host/pci-tegra.ko”.

Could you give me more explicit instructions?

You might find this of interest:
[url]https://devtalk.nvidia.com/default/topic/1038175/jetson-tx2/tx2i-wifi-support/post/5274619/#5274619[/url]

FYI, the header include directory is not full kernel source. You’ll see source_sync.sh listed in that URL, start with this to get full source.

When you get to the configure stage (e.g., “make nconfig” is one way) you will be able to select a config as integrated via the ‘y’ key, or as a module via the ‘m’ key (not all features can be a module, but most can, and if ‘m’ or ‘y’ key doesn’t work, then you probably found one way it can’t be configured).

Figure out how to build a kernel which is an exact replica of the current kernel first, and then worry about the patch. The short version is that if you don’t use the patch tool, then you more or less go to the line mentioned, and any “-” line is a line to remove, and any “+” is a line to add.

Thanks for all the information.

I have been following this tutorial on cross compiling.

https://developer.ridgerun.com/wiki/index.php?title=Compiling_Tegra_X1/X2_source_code#Build_Kernel

I have successfully compiled my own kernel with the changes to the pci-tegra module but I now want to compile the version for 4.4.38-tegra.

When I cross compile, I keep on getting 4.4.38-tegra+. I have tried many places to get rid of the + at the end of the version but I can’t shake it. As a result, I get this error in the dmesg:

pci_tegra: version magic ‘4.4.38-tegra+ SMP preempt mod_unload aarch64’ should be ‘4.4.38-tegra SMP preempt mod_unload aarch64’

Thanks,
Glen

I found a solution to the version here: [url]git - Don't add "+" to linux kernel version - Stack Overflow

Hi linuxdev,

We have tried the solution you suggested but it did not solve our problem.

Any other ideas?

So long as the kernel is basically the same configuration for non-module features you can probably just do a recursive copy of “/lib/modules/4.4.38-tegra/” to “/lib/modules/4.4.38-tegra+/” when replacing the Image file with a new Image, but you’ll need to test. Can you give the exact information on whether you are adding a new Image file, or just modules? What combination of things are you changing (this can change what might work or not)?