Have a PCBA with a Xilinx Kintex K7 which had a PCIe link (intended to operate at Gen 2 PCIe speeds). Interface is 4 lanes wide. TX2 module mounted on PCBA with Samtec SEAM-50-02.0-S-08-2-A-K connector.
TX2 module is reporting correctable AER errors over a large sample of PCBs ie 50 after building with a different supplier. Xilinx as PCIe endpoint is apparently not reporting any errors.
PCBs had set target impedance of 85 Ohms +/-10% and single sample of impedance measured at 92 Ohms. Previous builds where the impedance was measured at 81 Ohms did not exhibit any errors OR exhibit a very low rate of errors. NOTE: The impedances meet NVIDIA’s requirement of 85 Ohms +/-15% and have no vias from the Xilinx to the Samtec connector.
Hence we feel this merits a review to determine root cause of the AER errors.
Full gerbers/schematcs and PCB files can be provided as necessary.
Could you check if ASPM (Active State Power Management) is disabled for the PCIe link?
We ran lspci and it appears that ASPM is not supported ie
Recommend to check signal integrity and power integrity/power supply noise issues if any on your board before and after the PCB (vendor) change.
What type of ‘Corrected’ errors are observed here? Are they of type ‘Receiver Error’? Please capture the error logs and attach them here for review.
Please provide complete boot logs on a good/passing system and failing/PCIe AER error system for review.
We’re in the process of getting SI results. We’re awaiting the procurement of the necessary, high bandwidth VNA. This will likely take another week (from March 18th). Will post results back here.
Here are logs for a failing and non-failing camera.
FAILING CAMERA:
[ 1215.376065] pcieport 0000:00:01.0: can’t find device of ID0020
[ 1215.554115] pcieport 0000:00:01.0: AER: Corrected error received: id=0020
[ 1215.554130] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[ 1215.564343] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00000001/00002000
[ 1215.572710] pcieport 0000:00:01.0: [ 0] Receiver Error (First)
[ 1216.353015] pcieport 0000:00:01.0: AER: Corrected error received: id=0020
[ 1216.353029] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[ 1216.363255] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00000001/00002000
[ 1216.371627] pcieport 0000:00:01.0: [ 0] Receiver Error (First)
NON-FAILING CAMERA:
[ 2.453230] pci 0000:00:01.0: [10de:10e5] type 01 class 0x060400
[ 2.453337] pci 0000:00:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 2.453471] iommu: Adding device 0000:00:01.0 to group 55
[ 2.453575] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 2.453718] pci 0000:01:00.0: [10ee:7024] type 00 class 0x058000
[ 2.453763] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0007ffff]
[ 2.454062] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot
[ 2.454219] iommu: Adding device 0000:01:00.0 to group 56
[ 2.463842] pci 0000:00:01.0: BAR 14: assigned [mem 0x40100000-0x401fffff]
[ 2.463848] pci 0000:01:00.0: BAR 0: assigned [mem 0x40100000-0x4017ffff]
[ 2.463860] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 2.463869] pci 0000:00:01.0: bridge window [mem 0x40100000-0x401fffff]
[ 2.464067] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[ 2.464071] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[ 2.464078] pcie_pme 0000:00:01.0:pcie001: service driver pcie_pme loaded
[ 2.464158] aer 0000:00:01.0:pcie002: service driver aer loaded
[ 13.114678] incagrab 0000:01:00.0: enabling device (0000 → 0002)
Here’s a summary of the Xilinx end power rail measurements:
| Signal |
Pins |
Capacitor or Resistor |
Nominal Voltage |
| MGTAVV_G |
F7 |
C420 |
1.0V |
| MGTAVT_G |
A4 |
C418 |
1.2V |
| MGTVCC_AUX_G |
N6 |
C415 |
1.8V |
| MGTAVTTRCAL |
M6 |
R473 to GND at C404 |
1.0V |
| MGTRREF |
M5 |
R473 to GND at C404 |
1.0V |
NOTE: All measurements were taken with 1GHz probe, 2GHz bandwidth scope. All pk-pk noise less than 10mV.
All Xilinx transceivers in the range
A1-A7, G1-G7
There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks ~0409
Do you still have PCB supplies from the original vendor? Has anything changed like components apart from this PCB vendor change?