Xilinx PCIe issue on Jetson Tx2

Hi,
I try Xilinx PCIe card on Jetson Tx2, when I install xdma.ko provided by xilinx offical and compiled in Jetpack 4.3, i get these errors:

[   87.375267] xdma v2017.0.45
[   87.378641] xdma 0000:01:00.0: enabling device (0000 -> 0002)
[   87.481765] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf40c000
[   87.481767] CPU3: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000101, esr=bf40c000
[   87.481781] CPU5: SError detected, daif=1c0, spsr=0x40000045, mpidr=80000103, esr=bf40c000
[   87.481795] CPU0: SError detected, daif=1c0, spsr=0x40000045, mpidr=80000100, esr=bf40c000
[   87.481939] CPU1: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000000, esr=be000000
[   87.481970] CPU2: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000001, esr=be000000
[   87.481974] ROC:IOB Machine Check Error:
[   87.481978]  Address Type = Secure DRAM
[   87.481988]  Address = 0x0 (Unknown Device)
[   87.675903] CPU3: SError detected, daif=1c0, spsr=0x60000045, mpidr=80000101, esr=bf00c002
[   87.676594] ROC:CCE Machine Check Error:
[   87.676621]  Address Type = Secure DRAM
[   87.676687]  Address = 0x0 (Unknown Device)
[   87.676860] ROC:IOB Machine Check Error:
[   87.676877]  Address Type = Secure DRAM
[   87.676899]  Address = 0x0 (Unknown Device)
[   87.772729] tegra-pcie 10003000.pcie-controller: PCIE: Transcation timeout, signature: dead2009
[   87.772853] CPU1: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000000, esr=be000000
[   87.870200] ROC:IOB Machine Check Error:
[   87.870225]  Address Type = Secure DRAM
[   87.870261]  Address = 0x0 (Unknown Device)
[   87.966540] CPU2: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000001, esr=be000000
[   88.063237] CPU5: SError detected, daif=1c0, spsr=0x60000045, mpidr=80000103, esr=bf000002
[   88.063330] ROC:IOB Machine Check Error:
[   88.063332]  Address Type = Secure DRAM
[   88.063339]  Address = 0x0 (Unknown Device)
[   88.063360] CPU3: SError detected, daif=1c0, spsr=0x60000045, mpidr=80000101, esr=bf000002
[   88.063404] **************************************
[   88.063407] Machine check error in DCC:1:
[   88.063419]  Status = 0xf400000100000405
[   88.063428]  Bank does not have any known errors
[   88.063437]  Overflow (there may be more errors)
[   88.063445]  Uncorrected (this is fatal)
[   88.063453]  Error reporting enabled when error arrived
[   88.063465]  ADDR = 0x134
[   88.257172] **************************************
[   88.257175] ROC:CCE Machine Check Error:
[   88.257193]  Address Type = Secure DRAM
[   88.257218]  Address = 0x0 (Unknown Device)
[   88.257382] ROC:IOB Machine Check Error:
[   88.257397]  Address Type = Secure DRAM
[   88.257418]  Address = 0x0 (Unknown Device)
[   88.353905] CPU1: SError detected, daif=1c0, spsr=0x400000c5, mpidr=80000000, esr=be000000
[   88.450889] ROC:CCE Machine Check Error:
[   88.450911]  Address Type = Secure DRAM
[   88.450941]  Address = 0x0 (Unknown Device)
[   88.451110] ROC:IOB Machine Check Error:
[   88.451126]  Address Type = Secure DRAM
[   88.451147]  Address = 0x0 (Unknown Device)
[   88.547419] tegra-pcie 10003000.pcie-controller: PCIE: Transcation timeout, signature: dead2009
[   88.547548] CPU2: SError detected, daif=1c0, spsr=0x400000c5, mpidr=80000001, esr=be000000
[   88.547552] ROC:IOB Machine Check Error:
[   88.547555]  Address Type = Secure DRAM
[   88.547560]  Address = 0x0 (Unknown Device)
[   88.547581] CPU5: SError detected, daif=1c0, spsr=0x60000045, mpidr=80000103, esr=bf000002
[   88.837918] **************************************
[   88.837920] CPU3 Machine check error in AXIP2P@0x2110000:

and then system restart…
my PCIe has IO resource and memory get exposed to system, and lspci output is:

jetson@linux:~$ sudo lspci -vv
[sudo] password for jetson: 
00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 382
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Memory behind bridge: 40100000-403fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
                Mapping Address Base: 00000000fee00000
        Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: pcieport

01:00.0 Memory controller: Xilinx Corporation Device 7024
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 382
        Region 0: Memory at 40100000 (32-bit, non-prefetchable) [disabled] [size=1M]
        Region 1: Memory at 40300000 (32-bit, non-prefetchable) [disabled] [size=64K]
        Region 2: Memory at 40200000 (32-bit, non-prefetchable) [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00

It looks like the FPGA endpoint device in this scenario stops responding as soon as it it is enabled by the Linux kernel’s PCIe sub-system.
Do you see this behavior when the same setup is connected to an x86 system?
I think this needs to be checked with the FPGA vendor.

Not yet…
I will try to connect to an x86 system and see the result soon…

I modified the size of region 0,2 from 1M to 64K, then kernel error didn’t happen…

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.