Pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)

Hi,

I am facing pcieport error on AGX Xavier JetPack_4.5.1 kernel 4.9.201.
This error is shown while high loading wireless communication and stops it.
This happens with Qualcomm wireless module.

What I did try:

  1. Disable ASPM.
  2. Fixed clock maximum with using jetson_clocks.

Do you have any workaround or can you tell me how to debug PCIe?

Error print:

pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
pcieport 0003:00:00.0: device [10de:1ad2] error status/mask=00000001/0000e000

lspci:

0003:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 37
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        Memory behind bridge: 40000000-401fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x1, ASPM not supported, Exit Latency L0s <1us, L1 <64us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
                LnkSta: Speed 8GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
                RootCap: CRSVisible+
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] #19
        Capabilities: [158 v1] #26
        Capabilities: [17c v1] #27
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
                          PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=70us
                L1SubCtl2: T_PwrOn=40us
        Capabilities: [1a0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
        Capabilities: [2a0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [2d8 v1] #25
        Capabilities: [2e4 v1] Precision Time Measurement
                PTMCap: Requester:- Responder:+ Root:+
                PTMClockGranularity: 16ns
                PTMControl: Enabled:- RootSelected:-
                PTMEffectiveGranularity: Unknown
        Capabilities: [2f0 v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
        Kernel driver in use: pcieport

0003:01:00.0 Network controller: Qualcomm Device 1103 (rev 01)
        Subsystem: Qualcomm Device 3374
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin ? routed to IRQ 821
        Region 0: Memory at 12b0000000 (64-bit, non-prefetchable) [size=2M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=32/32 Maskable+ 64bit-
                Address: 91000000  Data: 0000
                Masking: ffe03a40  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] #19
        Capabilities: [158 v1] Transaction Processing Hints
                No steering table available
        Capabilities: [1e4 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [1ec v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=70us PortTPowerOnTime=0us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=40us
        Kernel driver in use: cnss_pci

Give the type of AER error being observed here (i.e. Physical Layer) and also since ASPM is already disabled, I think the issue may genuinely be due to improper physical connection. Can you remove and reconnect the card?

AER log shows type is Physical Layer.
The error is still duplicated if reconnected.

The issue seems to be away if PCIe Gen fixed to 2 in dtb.

Do you have a PCI card list which is validated in PCIe Gen 3?

Device Type       Model            Speed      POR CARD   VID    PID   Form Factor	Lane	ASPM Support	 Software Support

PCIe switch       RCS_r2UG-A2E16-A     Gen 3     POR     10b5   8747    Desktop         x16     NO                 YES
GPU	          GM107	                Gen 3            10de   13ae 	Desktop  	x16	ASPM L0s/L1/L1SS   NO 
NIC               Intel82572EI     	Gen 1            8086	10b9	Desktop 	x1	ASPM L0s	   YES                       
USB3.1 Gen 1      NEC  uPD720200	Gen 2            1033	194     Desktop 	x1	ASPM L0s and L1	   YES
USB3.1 Gen 2      ASUS                 Gen 2      POR   1b21	1242     Desktop 	x1	ASPM L0s 	   YES
USB 2.0           Moschip                         POR   9710    9990   Desktop         x1  
SSD	           OCZ 	                          POR   1b85	1041    Desktop 	x4	ASPM L0s/L1	   YES
SSD	          OCZ RVD400-22280-512G-A  Gen 3	       New Device procured for Xavier   x4
SATA	          Silicon Image Corp. 3132  Gen 2           1095	1095	Desktop	        x1	ASPM L0s	   Yes
NIC               Intel E10G42BT X520-T2 10Gigabit  Gen 2    POR               New Device procured for Xavier   
NIC               StarTech 1G Card          Gen 1          New Device procured for Xavier   
NIC               TP-Link TG-3468  1G Card    Gen 1        New Device procured for Xavier 
NIC               ASUS ROG AREION 10G Express 10Gbps Gen 3  New Device procured for Xavier   x4
NIC               ASUS XG-C100C 10G          Gen 3         New Device procured for Xavier    x4
NIC	          Realtech RTL8111/8168       Gen 1	 10ec	8168	Desktop 	x1	ASPM L0s/L1	   YES
                 Gigabit Ethernet Controller
NIC               Intel CT Desktop  Ethernet   Gen 1	 8086	1533	Desktop 	x1	ASPM L1	YES
                  Controller (82574L)
NIC               Tehuti 10G Network Card        Gen 2           Desktop	        X4	 	           No  
NIC               Intel Ethernet Service 	 Gen 2   8086	1521	Desktop	        x4	ASPM L0s/L1	   Yes
                  Adaptor I350-T2                  
NIC              Intel Ethernet I210 – T1  POR   8086    1533    Mobile	        x1	ASPM L0s/L1	   Yes
NVME	          Toshiba NVME ( New)       Gen 3     1179	010F	Mobile	        x4	ASPM L0s/L1/L1SS   Yes ( L1.1 is not supported and L1.2 has issue) 
NVME	          Toshiba NVME 	 (Old)      Gen 3 POR   1179	010F	Mobile	        x4	ASPM L0s/L1/L1SS   Yes  (L1.1 is Supported)
NVME	          Intel 750 Series SSDPEDMW400G4	    Gen 3         8086	953	Desktop  	x4	ASPM L0s/L1	   Yes
NVME	          Samsung NVME	            Gen 3          144d	A802	Mobile	        x4	ASPM L0s/L1        Yes
NVME	          WD NVME WDS256G1X0C	            Gen 3    POR   15B7    5001   Mobile	        x4	ASPM L0s/L1/L1SS   Yes
NVME	          Plextor PX-128M8PeGN	    Gen 3               New Device procured for Xavier   x4
NVME	          Intel SSDPEKKW512G7X1	    Gen 3           New Device procured for Xavier   x4
NVME	          Kingston  SKC1000/240G    Gen 3       New Device procured for Xavier   x4
WIFI	          Realtek RTL8188CE	    Gen 1    10e	 	Desktop	        x1	ASPM L0s/L1	    -
WIFI	          Realtek RTL8822BE	    Gen 1     POR    M.2 ( NVIDIA Designed moudule )
WIFI              Broadcom 4356            Gen 1	     1400	43ec   	Mobile  	x1	ASPM L0s/L1/L1SS    -
WIFI              TPLINK N900              Gen 1                                                                    No      New Card but not enumerating in tegra Bug ID  200338507   
WIFI              DLINK N300               Gen 1                                                                   No    New Card but not enumerating in tegra Bug ID  200338507

These are the various cards we have tested (please note that only some of them are Gen-3 capable)
Also, all the latest Nvidia dGPUs are (which are all Gen-3/4 capabl) are tested.

Thank you. Might be an issue of the wireless card.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.