PCIe Bus Error message

I’m not clear on the kind of board we are talking about here.
Jetson-tk1 (as shown in Embedded Systems Developer Kits & Modules from NVIDIA Jetson ) comes with pre-populated Realtek NIC. Is this what you are referring to as ‘customized board’ in comment #19 ? If not, what is this ‘customized board’ exactly and how is Realtek NIC chip getting soldered on that? It clearly looks like there is some issue (physical connectivity) with the way chip is being placed on board.

Hi, Vidyas,
If you think this issue may be caused by physical connectivity, is there any way we can adjust the signal/phase for PCIE signals ?
Or if you have any idea how we can know it is failed by wrong way of soldering the rtl NIC ?

Please advice.

Well, the fact that we have AER errors itself tells us that there is something wrong at physical level and we can’t know beyond that.

Hi vidyas,
I trace kernel/drivers/pci/host/pci-tegra.c and find following codes in function tegra_pcie_enable_pads.

/* WAR for Eye diagram failure on lanes for T124 platforms */
pads_writel(0x44ac44ac, PADS_REFCLK_CFG0);
pads_writel(0x00000028, PADS_REFCLK_BIAS);

I can’t find the detail definitions of register PADS_REFCLK_CFG0 in TRM.
Dose the value of register PADS_REFCLK_CFG0 depend on different hardware layout?

Thanks.

No. These are not platform dependent

Hello,
We flash L4T R21.6 on Jetson TK1.
It doesn’t show “PCIe Bus Error” when we don’t install PCIe wifi module.
But after we install intel Wireless-AC 3160, it will also show “PCIe Bus Error” once (please refer to the attachment file “PM375 DEV KIT with Intel AC3160.txt”).

Thanks!

PM375 DEV KIT with Intel AC3160.txt (57.2 KB)

Hello,
Can you see the situation described in comment #26.
It seems not only “Realtek RTL8111GS-CG” can cause PCIe Bus Error, “intel Wireless-AC 3160” can also cause PCIe Bus Error.
Do you have any idea from this situation?

Thanks!

Do the wireless devices require firmware? If they do, then the firmware must be loaded prior to loading the driver. I think that at least the 3160 probably requires firmware. Also, if the firmware used does not match what the driver assumes it will also fail.

NOTE: Under Fedora I see package iwl3160-firmware, but I don’t see it under the Ubuntu package manager.

Hello linuxdev,
“Intel Wireless-AC 3160” need iwlwifi-3160-12.ucode, we alreay have it in /lib/firmware.
And the we can connect to internet through 3160.

About our PCI Bus Error message:
[ 18.819032] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[ 18.837793] pcieport 0000:00:00.0: device [10de:0e13] error status/mask=00000001/00002000
[ 18.846854] pcieport 0000:00:00.0: [ 0] Receiver Error (First)
I read line 98~101 of Documentation/PCI/pcieaer-howto.txt in kernel source code.

"Correctable errors pose no impacts on the functionality of the interface. The PCI Express protocol can recover without any software intervention or any loss of data. These errors are detected and corrected by hardware."

Our AER error is Correctable error, so It seems fine when only print this message once. Our problem is some of our board only with one PCIE ethernet chip “Realtek RTL8111GS-CG” on it keep printing this message so that we may not even boot into system.

Thanks!

Does it show a pointer to first error other than NULL?

Hello linuxdev,
From the PCIe Bus Error message:
[ 18.819032] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
type=Physical Layer, it should be a hardware error.
The value of Tegra K1 register PCIE2_RP_ERPTCAP_ERR_STS(0x01001130) is 0x3 when this board keep printing PCIe Bus Error message.
It seems the signal Tegra K1 receive has problem.

Thanks!

I don’t know if there is any kind of setting for drive strength…but a related setting would be pre-emphasis and de-emphasis. If you are able to get output from “sudo lspci -vvv” (use “-s …slot…” to limit to one device if there are many) you might see what it says about emphasis…this can probably switched (with some difficulty) between -3.5dB and 6dB.

In PCIe v1 only -3.5dB is used…when PCIe v2 came out, and when traces starting getting longer for the full sized ATX motherboards, -6dB was added as an option (which was used in v2 devices with longer traces…all v1 devices were left at -3.5dB…only v2 devices closer to the controller were left at -3.5dB while v2 devices with longer traces got -6dB). This can open or close the eye diagram slightly for the better or worse depending on traces.

Hello linuxdev,
This is the result of using lspci command:
$ lspci -vvv
00:00.0 PCI bridge: NVIDIA Corporation Device 0e13 (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00001000-00001fff
Memory behind bridge: 32100000-321fffff
Prefetchable memory behind bridge: 0000000012100000-00000000121fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities:
Kernel driver in use: pcieport

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 641
Region 0: I/O ports at 1000
Region 2: Memory at 32100000 (64-bit, non-prefetchable)
Region 4: Memory at 12100000 (64-bit, prefetchable)
Capabilities:
Kernel driver in use: r8169

How can I get the emphasis information from this?
Thanks!

The “lspci -vvv” will give you that information, but you have to use “sudo” or you can’t see everything.

Hello linuxdev,
This is the result of using lspci command:
$ sudo lspci -vvv -s 01:00.0
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 641
Region 0: I/O ports at 1000
Region 2: Memory at 32100000 (64-bit, non-prefetchable)
Region 4: Memory at 12100000 (64-bit, prefetchable)
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000ad768000 Data: 0001
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Unknown small resource type 00, will not decode more.
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
Capabilities: [170 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: r8169

De-emphasis is -6dB.

Thanks!

Just a comment ahead of time…PCIe v1 hardware is not capable of negotiating -6dB pre-emphasis…it can only do -3.5dB pre/de-emphasis. PCIe v2 hardware can do both levels and is programmable. PCIe v3 is much more sophisticated in how it chooses emphasis, but that’s another story since the Jetson is PCIe v2.

The standards say to match both the end point and bridge/controller talking to the end point to the same value regardless of which value is chosen, but mixing them does not necessarily break anything…it does alter the eye diagram though, and the theory is that the pre-emphasis and de-emphasis work together to compensate for different lengths of copper run. What happens is that when you boost and then reduce just the leading edge of the square wave is that you end up with a net distortion of zero. Longer copper lengths will alter an eye diagram and it will start closing…boosting then reducing that leading edge actually does a partial compensation for that trace loss and ends up as a better (more open) eye diagram with the correct pre/de-emphasis versus just sending without any boost/reduction of that leading edge. But the two must emphasize or deemphasize the same amount in order to remain symmetric.

If you look at the “GT/s” listings under “capability” (LnkCap) you’ll see what speed ratings the card is capable of achieving. “2.5GT/s” is PCIe v1, “5GT/s” is PCIe v2. On a PCIe v2 card an inability to run at the full 5GT/s (when due to signal quality problems) would result in throttling back to 2.5GT/s (it isn’t possible to throttle back to slower than v1 speeds). I don’t know for certain, but I’d think whenever throttling back to 2.5GT/s a PCIe v2 card could do either -3.5dB or -6dB…but as I said, I’m not sure…though I suspect the intent is throttling back to v1 would warrant also using the v1 -3.5dB (it is probably a bug to mix -3.5dB and -6dB even if hardware can do it).

If you look at your particular card it is v1 (both LnkCap…capability…and LnkSta…current setting status…are v1) with no ability to reach v2 speeds. Your hardware is not capable of understanding -6dB pre/de-emphasis. It would probably be proper for the host side to also use -3.5dB, else the pre/de-emphasis is not symmetric. You’d have to put a fairly expensive analyzer on the bus and watch it with both the mixed -6dB and fixed -3.5dB to see if there is actually any improvement one way or the other…it wouldn’t be unusual for it to not matter that -3.5dB and -6dB are being mixed. On the other hand, it would be “technically correct” for pre/de-emphasis to match before determining if the eye diagram is valid or not.

I’ll suggest that the host side needs to be set to -3.5dB for your PCIe v1 case, but I’m not sure what the change is for this (long ago I remember some conversations on the topic, but I don’t think the R28.1 release was out back then). Someone will need to provide details on getting host side to -3.5dB. This won’t necessarily fix your eye diagram, but it is something worth trying.

Hello linuxdev,
The LAN IC FAE wants us to write LAN PCI config space offset 0x80 as 0x40.
Do you know how to do it?
I found in Google there is a linux tool pcitweak that can read/write PCI config space.
But I can’t install it in Ubuntu.

Thanks!

Probably the proper method would be via device tree…but someone else who knows the details would have to answer. In part I don’t know if the Jetson’s different architecture might change what you see for a PC.

I was seeing the same problem on our custom board and it turned out to be missing DC blocking caps. If you are going to attach a PCIe device directly to the Jetson TX1/TX2, you need to remember to put 0.1uF DC blocking caps on BOTH the TX and RX signal pairs. We put the ones on the TX pairs coming from the Jetson (as shown in the OEM Product Design Guide) but forgot about the other pair being driven by our PCIe device. It’s easy to forget the the caps are normally placed on the add-in card TX lines but when you put the device directly on the carrier board, it can missed.

Hi Dfcbrad,
Do you means that the PEX_RX4 pair of Jetson TK1?it’s for Mini-PCIE slot and following the PM375 Jetson TK1 Development Kit,it says that “AC coupling on PERp0/PERn0 on PCIE Mini Card.” on the Specification.