PCIE will crash if DMA size%256 less than 4

Dear Nvidia Guys

We connect FPGA to one PCIE port,but if we set the DMA size%256 <=4, PCIE will crash.
(For examples: we set DMA size = 0x1000 will be ok, but 0x1004 will be NG)

What could be the problem?

Looking forward to your reply.
Thanks a lot!

CPU: TK1
L4T VERSION: 21.7

Kernel dmesg info:

[   57.434685] pcieport 0000:00:00.0: AER: Uncorrected (Fatal) error received: id=0010
[   57.434781] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0000(Receiver ID)
[   57.446450] pcieport 0000:00:00.0:   device [10de:0e12] error status/mask=00040000/00000000
[   57.454973] pcieport 0000:00:00.0:    [18] Malformed TLP          (First)
[   57.461786] pcieport 0000:00:00.0:   TLP Header: 00000001 010000ff ae900000 00000000
[   57.469532] pcieport 0000:00:00.0: broadcast error_detected message
[   57.469544] hwctrlpcie 0000:01:00.0: device has no AER-aware driver
[   57.672987] pcieport 0000:00:00.0: Root Port link has been reset
[   57.673044] pcieport 0000:00:00.0: AER: Device recovery failed

Is this a duplicate of:
https://devtalk.nvidia.com/default/topic/1066981/jetson-tk1/endless-quot-pcie-response-decoding-error-quot-after-wake-up-from-standby/

I believe the debug information is slightly different, but both posts are from around the same time. Is this from the same device after delay of wake up? If so, then you should probably continue in the original thread since we don’t know for sure if the sleep wake up delay is or is not valid yet.

Dear linuxdev

Thanks for your reply.
This problem will be also existing before i add delay for wake up from standby.
So it’s not caused by adding delay.

Below is more detailed messages after pci crashed.

[   37.018206] pcieport 0000:00:00.0: AER: Uncorrected (Fatal) error received: id=0010
[   37.018303] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0000(Receiver ID)
[   37.029974] pcieport 0000:00:00.0:   device [10de:0e12] error status/mask=00040000/00000000
[   37.038471] pcieport 0000:00:00.0:    [18] Malformed TLP          (First)
[   37.045282] pcieport 0000:00:00.0:   TLP Header: 00000001 010000ff ae900000 00000000
[   37.053026] pcieport 0000:00:00.0: broadcast error_detected message
[   37.053040] hwctrlpcie 0000:01:00.0: device has no AER-aware driver
[   37.257065] pcieport 0000:00:00.0: Root Port link has been reset
[   37.257123] pcieport 0000:00:00.0: AER: Device recovery failed
[   45.094583] [hwctrl-pcie] > device release.
root@tegra-ubuntu:/home/ubuntu/damon# 
root@tegra-ubuntu:/home/ubuntu/damon# 
root@tegra-ubuntu:/home/ubuntu/damon# 
root@tegra-ubuntu:/home/ubuntu/damon# lspci
00:00.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x4 Bridge (rev a1)
01:00.0 Memory controller: Device 87cd:3211
02:00.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
root@tegra-ubuntu:~# lspci -t -v
-+-[0000:02]---00.0-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
 \-[0000:00]---00.0-[01]----00.0  Device 87cd:3211
root@tegra-ubuntu:~# 
root@tegra-ubuntu:~# 
root@tegra-ubuntu:~# 
root@tegra-ubuntu:~# 
root@tegra-ubuntu:/home/ubuntu/damon# lspci -s 00:00.0 -vvv
00:00.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x4 Bridge (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: 32200000-322fffff
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
		Address: 00000000ad740000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 12, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: pcieport

root@tegra-ubuntu:/home/ubuntu/damon# lspci -s 01:00.0 -vvv
01:00.0 Memory controller: Device 87cd:3211
	Subsystem: Xilinx Corporation Device 0007
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 130
	Region 0: [virtual] Memory at 32200000 (32-bit, non-prefetchable) 
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Device Serial Number 00-00-00-01-01-00-0a-35
	Kernel driver in use: hwctrlpcie

root@tegra-ubuntu:/home/ubuntu/damon#

I am unable to answer the DMA size questions. However, from the original thread, is it correct to assume that the initial boot delay of PCIe enumeration allows working correctly until using standby? Knowing if the device initially works prior to any kind of power saving mode would be very useful information, especially if DMA was running successfully prior to standby/resume.

Incidentally, this device does not support the advanced error reporting (AER), and thus debugging information is minimal. What we do know:

  • The bridge is functioning correctly.
  • The device is only capable of PCIe rev. 1 (2.5GT/s).
  • The bridge is capable of PCIe rev. 2 (5GT/s).
  • The bridge correctly throttled back to rev. 1 since the device can only handle rev. 1.
  • Assuming operation is good until some sort of standby/resume, we know signal quality is good.
  • Assuming DMA succeeded prior to standby/resume we know any kind of DMA setup is correct (at least prior to standby).

So one question is whether the driver itself is capable of handling standby/resume correctly. I have no way of debugging that. However, if you know DMA worked before, but fails after standby/resume, then you can at least tie it down to that part of the code/driver and/or the hardware behavior after this. Probably the next person looking at this will need to know which driver you are using, along with details of how the DMA works before/after the standby/resume. Unfortunately, the driver itself is not going to say much about the error, and so someone may need to suggest debug printk statements to add to the driver.

Dear linuxdev

Thanks for your reply!

This DMA size question has nothing to do with standby/resume.
I didn’t trigger TK1 into standby mode during DMA testing.
And i didn’t apply the time delay code metioned in the original thread during DMA testing.

In our pcie driver:
1.use ioremap_nocache to map the BAR0 for FPGA register’s write/read in probe function.
2.use dma_alloc_coherent to alloc buffer for DMA in probe function
3.use pgprot_noncached + remap_pfn_range to map physical address into userspace in mmap function.
4.implement some sub-fuctions for starting DMA with size and physical address in ioctl function.

I couldn’t personally say what the requirements are for data size or alignment, but there are some differences in DMA (physical/virtual) addressing between the older arm32 systems and the arm64 systems, and I suspect you may be running into one of those differences. Someone else will need to comment for details of DMA addressing in arm32, but was this driver originally written for arm64 or other 64-bit?

Dear linuxdev

We didn’t find out the root cause of this problem.
Temporary we limit the size in pcie device driver.

Thanks a lot for your advice.

Just a thought on this topic. Unlike virtual memory there are limitations on DMA which requires non-fragmented contiguous memory. Perhaps there simply was not a large enough contiguous memory available for larger sizes.