PCIe IOMMU Error

[1] https://devtalk.nvidia.com/default/topic/1026815/jetson-tx2/nvme-triggers-iommu-faults-on-tx2/3

Similar to Ref 1, it appears that we are having an issue with the PCIe IOMMU failing. In our case, we are using an FPGA for frontend data acquisition. Right now we have an easy case, in that we buffer 256kB of data and send it over the PCIe (via Xilinx DMA/Bridge Subsystem IP). We can successfully send 255 x 256kB packets; however, even if we complete that transfer (by this I mean asserting axis_tlast on the FPGA) the very next 256kB packet that is sent will cause the PCIe to error (dmesg error attached). It so happens, that this 256th packet is also the 256th interrupt event (see attached /proc/interrupt).

As such, it is unclear to me if this is an interrupt constraint and our interrupts are not getting serviced (we are using MSI interrupts), if it is an issue with the IOMMU similar to Ref. 1, or if the reference driver we are using is not set-up properly to handle more than 64MB of data before we need to close the char device and reopen. If we decide to disable the IOMMU, will this cause issues for other I/O on the TX2?

Our desired end goal is to stream/transfer ~1.5GB of data from the FPGA to the TX2 as quickly as possible. For now, we are saving the data as binary file chunks on the TX2. We are using the Xilinx AR#65444 as our guide (https://www.xilinx.com/support/answers/65444.html) with the following changes:

xdma-core.h

RX_BUF_BLOCK 262144
RX_BUF_PAGES 1024

We also used their dma_from_device.c application as a reference to design our own, condensed application.

Any help at debugging and resolving our issue would be much appreciated!

Please let me know if you desire more information.

R/
Nick

=========================================== ATTACHMENTS ========================================
Attachment 1: PCIe error snippet (dmesg)

[   89.638972] arm-smmu 12000000.iommu: Unhandled context fault: iova=0xa0030000, fsynr=0x240013, cb=22, sid=17(0x11 - AFI), pgd=22fb6f003, pud=22fb6f003, pmd=22ed5f003, pte=0
[   89.654539] (255) csw_afiw: MC request violates VPR requirements
[   89.660651]   status = 0x00337031; addr = 0x3ffffffc0
[   89.665746]   secure: yes, access-type: write
[   89.670146] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[   89.680008] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[   89.689867] unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000
[   89.699716] mc-err: Too many MC errors; throttling prints

Attachment 2: cat /proc/interrupt | grep xdma @ time of PCIe error (w/ 255 and there is no error)

452:        256          0          0          0          0          0  Tegra PCIe MSI   0 Edge      xdma

Attachment 3: lspci -vvvv output

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 388
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: 50100000-502fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: pcieport

01:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
	Subsystem: Xilinx Corporation Device 0007
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 452
	Region 0: Memory at 50100000 (32-bit, non-prefetchable) 
	Region 1: Memory at 50200000 (32-bit, non-prefetchable) 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fb225000  Data: 0000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: xdma

What is the jetpack version you are using?
I don’t expect anything specific to happen on 256th interrupt as such. BTW, what is the bandwidth utilization when this issue is observed? Also, possible to share your end point driver ‘xdma’ offline?

@vidyas,

  1. jetpack-l4t-3.1

  2. Not sure what the bandwidth utilization was and I have no way to measure it at the moment. It isn’t useful I know, but we had our hardware configured for 5.0GT/s.

  3. I can share the driver we are using. Please tell me where/how you would like it sent.

R/
Nick

Hi chirstnp_work,
Have you fix this problem?