PCIE-HUB chip not being detected on boot occasionally

A similar problem to the following POST is occurring in my environment.

We are using XavierNX Production Module in Custom Board environment.
PCIE-HUB CHIP (PI7C9X2G608GP) is implemented on Custom Board side.

However, sometimes the device at the DownPort end of the PCIE-HUB is not recognized.

○ Recognized

# lspci
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0005:01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch
0005:02:01.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch
0005:02:02.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch
0005:02:03.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch
0005:02:04.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch
0005:05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

○ Not Recognized

# lspci
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0005:01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G608GP PCIe2 6-Port/8-Lane Packet Switch

Below is output of dmesg when PCI card is successfully detected:

[    6.090146] tegra-pcie-dw 14160000.pcie: PCIe link is not up...!
[    6.090970] tegra-pcie-dw 141a0000.pcie: Setting init speed to max speed
[    6.092118] OF: PCI: host bridge /pcie@141a0000 ranges:
[    6.092221] OF: PCI:    IO 0x3a100000..0x3a1fffff -> 0x3a100000
[    6.095296] OF: PCI:   MEM 0x1f40000000..0x1fffffffff -> 0x40000000
[    6.101447] OF: PCI:   MEM 0x1c00000000..0x1f3fffffff -> 0x1c00000000
[    6.119790] usb 1-3: usb_suspend_both: status 0
[    6.119956] usb usb1: usb_suspend_both: status 0
[    6.219448] tegra-pcie-dw 141a0000.pcie: link is up
[    6.219854] tegra-pcie-dw 141a0000.pcie: PCI host bridge to bus 0005:00
[    6.219976] pci_bus 0005:00: root bus resource [bus 00-ff]
[    6.220091] pci_bus 0005:00: root bus resource [io  0x100000-0x1fffff] (bus address [0x3a100000-0x3a1fffff])
[    6.220253] pci_bus 0005:00: root bus resource [mem 0x1f40000000-0x1fffffffff] (bus address [0x40000000-0xffffffff])
[    6.220433] pci_bus 0005:00: root bus resource [mem 0x1c00000000-0x1f3fffffff pref]
[    6.220561] pci_bus 0005:00: scanning bus
[    6.220587] pci 0005:00:00.0: [10de:1ad0] type 01 class 0x060400
[    6.220732] pci 0005:00:00.0: PME# supported from D0 D3hot D3cold
[    6.220739] pci 0005:00:00.0: PME# disabled
[    6.220933] iommu: Adding device 0005:00:00.0 to group 58
[    6.221198] pci_bus 0005:00: fixups for bus
[    6.221207] pci 0005:00:00.0: scanning [bus 01-ff] behind bridge, pass 0
[    6.221290] pci_bus 0005:01: scanning bus
[    6.221407] pci 0005:01:00.0: [12d8:2608] type 01 class 0x060400
[    6.222131] pci 0005:01:00.0: supports D1 D2
[    6.222135] pci 0005:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    6.222157] pci 0005:01:00.0: PME# disabled
[    6.222422] iommu: Adding device 0005:01:00.0 to group 59
[    6.231639] pci_bus 0005:01: fixups for bus
[    6.231654] pci 0005:01:00.0: scanning [bus 00-00] behind bridge, pass 0
[    6.231660] pci 0005:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    6.231856] pci 0005:01:00.0: scanning [bus 00-00] behind bridge, pass 1
[    6.232136] pci_bus 0005:02: scanning bus
[    6.232251] pci 0005:02:01.0: [12d8:2608] type 01 class 0x060400
[    6.233011] pci 0005:02:01.0: supports D1 D2
[    6.233015] pci 0005:02:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[    6.233036] pci 0005:02:01.0: PME# disabled
[    6.233321] iommu: Adding device 0005:02:01.0 to group 60
[    6.233638] pci 0005:02:02.0: [12d8:2608] type 01 class 0x060400
[    6.234400] pci 0005:02:02.0: supports D1 D2
[    6.234406] pci 0005:02:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[    6.234426] pci 0005:02:02.0: PME# disabled
[    6.234708] iommu: Adding device 0005:02:02.0 to group 61
[    6.234998] pci 0005:02:03.0: [12d8:2608] type 01 class 0x060400
[    6.235764] pci 0005:02:03.0: supports D1 D2
[    6.235769] pci 0005:02:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[    6.235789] pci 0005:02:03.0: PME# disabled
[    6.236081] iommu: Adding device 0005:02:03.0 to group 62
[    6.236378] pci 0005:02:04.0: [12d8:2608] type 01 class 0x060400
[    6.237115] pci 0005:02:04.0: supports D1 D2
[    6.237128] pci 0005:02:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[    6.237148] pci 0005:02:04.0: PME# disabled

Here is output of dmesg when PCI card is NOT detected:

[    2.592212] tegra-pcie-dw 14160000.pcie: PCIe link is not up...!
[    2.593193] tegra-pcie-dw 141a0000.pcie: Setting init speed to max speed
[    2.594350] OF: PCI: host bridge /pcie@141a0000 ranges:
[    2.594366] OF: PCI:    IO 0x3a100000..0x3a1fffff -> 0x3a100000
[    2.594375] OF: PCI:   MEM 0x1f40000000..0x1fffffffff -> 0x40000000
[    2.594380] OF: PCI:   MEM 0x1c00000000..0x1f3fffffff -> 0x1c00000000
[    2.706353] tegra-pcie-dw 141a0000.pcie: link is up
[    2.706698] tegra-pcie-dw 141a0000.pcie: PCI host bridge to bus 0005:00
[    2.706709] pci_bus 0005:00: root bus resource [bus 00-ff]
[    2.706719] pci_bus 0005:00: root bus resource [io  0x100000-0x1fffff] (bus address [0x3a100000-0x3a1fffff])
[    2.706727] pci_bus 0005:00: root bus resource [mem 0x1f40000000-0x1fffffffff] (bus address [0x40000000-0xffffffff])
[    2.706733] pci_bus 0005:00: root bus resource [mem 0x1c00000000-0x1f3fffffff pref]
[    2.706761] pci 0005:00:00.0: [10de:1ad0] type 01 class 0x060400
[    2.706927] pci 0005:00:00.0: PME# supported from D0 D3hot D3cold
[    2.707163] iommu: Adding device 0005:00:00.0 to group 59
[    2.707568] pci 0005:01:00.0: [12d8:2608] type 01 class 0x060400
[    2.708301] pci 0005:01:00.0: supports D1 D2
[    2.708307] pci 0005:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    2.708628] iommu: Adding device 0005:01:00.0 to group 60
[    2.790389] pci 0005:01:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    2.790434] pci 0005:00:00.0: PCI bridge to [bus 01-ff]
[    2.790457] pci 0005:00:00.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  512
[    2.790464] pci 0005:01:00.0: Failed attempting to set the MPS
[    2.790660] pci 0005:01:00.0: Max Payload Size set to  128/ 256 (was  128), Max Read Rq  128
[    2.790997] pcieport 0005:00:00.0: Signaling PME through PCIe PME interrupt
[    2.791001] pci 0005:01:00.0: Signaling PME through PCIe PME interrupt
[    2.791008] pcie_pme 0005:00:00.0:pcie001: service driver pcie_pme loaded
[    2.791175] aer 0005:00:00.0:pcie002: service driver aer loaded
[    2.791322] pcieport 0005:01:00.0: of_irq_parse_pci() failed with rc=134
[    2.821673] tegra-cbb 14040000.cv-noc: noc_secure_irq = 90, noc_nonsecure_irq = 89>

The error occurs every once in about 30 times booting.

As with TOPIC above, the following error is generated by the kernel when the problem occurs.

[    2.791322] pcieport 0005:01:00.0: of_irq_parse_pci() failed with rc=134

According to the vendor of the PCIE-HUB,The reason why the down port is not recognized is because the up port is linked down after emulation and the register value in the PCIE-HUB is reset to default value.

The flow when the problem occurs is as follows.

1. XavierNX PCIE Root HUB and PCIE-HUB link up.

2. XavierNX PCIE Root HUB and PCIE-HUB are disconnected for some reason.

3. The XavierNX PCIE driver tries to read the CFG area, 
   but the following error occurs because the link is down

[    2.791322] pcieport 0005:01:00.0: of_irq_parse_pci() failed with rc=134

4. PCIE-HUB link up without DownStream.

We are using the same CustomBoard for both JetsonNano and XavierNX,
and this only happens for XavierNX.

What could be the cause of the temporary link down?

Is there any workaround as XavierNX?

After a few days of survey, I found the following POST.

After changing max-speed from (4 - Gen-4) to (2 - Gen-2) like the above POST,
the problem that the DownPort destination device is not recognized was resolved.
The phenomenon of the error in of_irq_parse_pci() also does not occur.

	pcie@141a0000 {
		status = "okay";

		vddio-pex-ctl-supply = <&p3668_spmic_sd3>;
		nvidia,disable-aspm-states = <0xf>;
		nvidia,enable-power-down;
-		nvidia,max-speed = <4>;
+		nvidia,max-speed = <2>;
...

Since the maximum speed supported by the PCIE-HUB CHIP(PI7C9X2G608GP) is Gen-2,
and our custom board always connects devices via the PCIE-HUB CHIP(PI7C9X2G608GP),
we will fix this by setting max-speed=2.

In the JetsonNano environment, PCIE port supports Gen-2 speed by default,
which may have prevented this problem.

However, I think that there may be a bug in the speed negotiation process
from max-speed=4 in Jetson XavierNX.

1 Like

Glad to know issue fixed.