Trouble getting any PCIE devices to show

Running a brand new flash of Jetpack 4.4.1 on the devkit. I’ve tried a number of PCIE devices and can not get anything to show up in lspci.

~$ lspci
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)

Which I assume is the esata controller and the controller for the main device storage.

Result of lspci -vv

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 35
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: 40000000-400fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootCap: CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [158 v1] #26
Capabilities: [17c v1] #27
Capabilities: [190 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=60us
L1SubCtl2: T_PwrOn=40us
Capabilities: [1a0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2a0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2d8 v1] #25
Capabilities: [2e4 v1] Precision Time Measurement
PTMCap: Requester:- Responder:+ Root:+
PTMClockGranularity: 16ns
PTMControl: Enabled:- RootSelected:-
PTMEffectiveGranularity: Unknown
Capabilities: [2f0 v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device 9171
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 564
Region 0: I/O ports at 100010 [size=8]
Region 1: I/O ports at 100020 [size=4]
Region 2: I/O ports at 100018 [size=8]
Region 3: I/O ports at 100024 [size=4]
Region 4: I/O ports at 100000 [size=16]
Region 5: Memory at 1230010000 (32-bit, non-prefetchable) [size=512]
Expansion ROM at 1230000000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fffff000 Data: 0000
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: ahci

dmesg log with a pcie device plugged in:dmesg_20201203.txt (74.3 KB)

Am I missing any obvious steps here/whats the next step to debug my pcie devices?

Hi @t.dale
from dmesg log all root ports are enabled.

  1. Can you provide list of devices that you have tried, are you using any interposer card ?
  2. Assuming you are using standard x8 PCIe port to connect PCIe device. Can you try M.2 port and as well

Thanks,
Om

1 Like

Hi there thanks for your reply.

We have tried:

  • Quadro P2000 PCIe x8
  • Asus ROG 10g network card PCIe x8
  • Wifi module mPCIe > PCIe x1 via adapter

Worth noting that all of these cards work in other machines (including the one with the adapter). We don’t intend on using any of these cards aside the wifi module for use with the Xavier, just hoping to get a vendor ID & device ID out of it so we know PCIe lane is working.

Will try m.2 and see what I get.

Thanks

Can confirm the M.2 drive works:

0000:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0000:01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a8 (rev 03)
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)

Worth noting too:

There are now 2 NVIDIA Corp devices, whereas before there was only one for some reason.

The device we have currently plugged in is a Qualcomm wifi adapter, reverse-lookup at pcilookup.com:
Qualcomm Atheros 168c Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter 11ad08a6

I have also attached the new dmesg outputdmesg_20201203_r2.txt (73.0 KB)

When the NVMe card was connected, it was connected to the M.2 key-M slot (owned by C0 controller) and it worked. The reason why you see two Nvidia corp devices is that they both are root ports to which two endpoints are connected. For every new device connected and enumerated, we see an addition of a pair i.e. one new root port and one new endpoint.
Now, you must have connected the Qualcomm Atheros card to M.2 Key-E slot right?
or is it connected to the CEM form factor directly?

Thanks for that - I understand now re Nvidia corp root devices.

The Atheros card is mPCIe, and is plugged in via PCIe x1 adaptor to the x8 port PCIe slot on the development board (is that what you are referring to by CEM?). All of our other test cards which also didn’t work were plugged in via the x16 slot.

So still no luck with getting any of our PCIe devices to even register, do you have any recommendations from here? Thanks

To start with Could you please set the following in “pcie@141a0000” node?
nvidia,max-speed = 2;
num-lanes = 1;
If possible, please keep debug prints in the driver pcie-tegra.c file to confirm that the above settings are reflecting?

I’ve set those settings and recompiled the kernel - not sure where to find the output from pcie-tegra.c but I get the following:

Which I assume means that the settings were applied correctly?

DMESG attached. Thanks.

dmesg.202012128.txt (74.1 KB)

I was asking you to make changes in ‘pcie@141a0000’ since this controller is what owns the x8 slot. I see that, in the last post, you were checking for ‘pcie@14100000’ node.
Since these are DT changes, you have to compile and update the DT.

Ah don’t know what went on there, must have had a long day. I’ve made the changes, hopefully I’ve done them correctly this time:

dmesg.20210112.log (71.9 KB)

Cheers

Since none of the devices connected to the x16 slot are working (i.e. getting enumerated), I start to suspect if the slot has gone bad. Could you please remove “nvidia,enable-power-down;” entry from the respective PCIe node’s DT entry? This makes sure that the controller won’t be powered down even when no endpoint device was detected (thereby only root port appearing in the lspci output). At this point, could you please check the following?

  • +12V and +3V3 supplies? (Refer to PCI Express - Wikipedia for the pins to probe)

  • Please probe PERST# (A-11) and see if a de-assertion (i.e. low->high transition as it is an active low signal) is observed?

  • Please probe REFCLK lanes to make sure that they have a 100 MHz clock there

  • As a last resort, you can loopback the interface (I.e. shorting Tx to Rx) to observe link up. If the link is indeed up, then, it confirms that the Tx/Rx lanes are fine. This particular experiment can be done alone i.e. without probing the aforementioned lanes.

Hope this helps.

Thanks, it was a faulty slot we got fixed.

Cheers