We face a problem with a custom carrier board and the jetson agx xavier module. When we connect a Expansion Board to our carrier board with two cascaded PCIe Switches (PI7C9X2G404SV from Diodes) on it, the system is stuck during boot. The Log File you can find attached. The device then will restart again and again. We saw this behavior with JetPack 4.2.2 when we did not change anything. Other PCIe Devices work on the same expansion interface and the expansion board with the two PCIe Switches works fine in an x86 system. The used PCIe Controller is C5. We only us it as x1 PCIe Interface.
Thank you for your help.
I’m suspecting switch not pulling the CLKREQ signal LOW properly.
Can you please apply the following change and see if the issue disappears?
diff --git a/drivers/pci/dwc/pcie-tegra.c b/drivers/pci/dwc/pcie-tegra.c
index ee5fbfa2f7c9..3980c771455c 100644
--- a/drivers/pci/dwc/pcie-tegra.c
+++ b/drivers/pci/dwc/pcie-tegra.c
@@ -4535,6 +4535,11 @@ static int tegra_pcie_dw_runtime_resume(struct device *dev)
val |= (APPL_CFG_MISC_ARCACHE_VAL << APPL_CFG_MISC_ARCACHE_SHIFT);
writel(val, pcie->appl_base + APPL_CFG_MISC);
+ val = readl(pcie->appl_base + APPL_PINMUX);
+ val |= APPL_PINMUX_CLKREQ_OVERRIDE_EN;
+ val &= APPL_PINMUX_CLKREQ_OVERRIDE;
+ writel(val, pcie->appl_base + APPL_PINMUX);
+
if (pcie->disable_clock_request) {
val = readl(pcie->appl_base + APPL_PINMUX);
val |= APPL_PINMUX_CLKREQ_OUT_OVRD_EN;
@@ -4718,6 +4723,11 @@ static int tegra_pcie_dw_resume_noirq(struct device *dev)
val |= (APPL_CFG_MISC_ARCACHE_VAL << APPL_CFG_MISC_ARCACHE_SHIFT);
writel(val, pcie->appl_base + APPL_CFG_MISC);
+ val = readl(pcie->appl_base + APPL_PINMUX);
+ val |= APPL_PINMUX_CLKREQ_OVERRIDE_EN;
+ val &= APPL_PINMUX_CLKREQ_OVERRIDE;
+ writel(val, pcie->appl_base + APPL_PINMUX);
+
if (pcie->disable_clock_request) {
val = readl(pcie->appl_base + APPL_PINMUX);
val |= APPL_PINMUX_CLKREQ_OUT_OVRD_EN;
In case if your release is using pcie-tegra-dw.c file, please make a similar change in that file (Old releases were using pcie-tegra-dw.c file hence this suggestion)
Thank you for the patch. Sadly with it, our system does not even boot without the expansion board. It gets stuck and starts rebooting after a while.
We came across another strange behavior with the PCIe switches:
When we go over one PCIe Switch with 2 NICs connected to it to a second switch with again two NICs connected, everything works fine and the system boots without problems. Also all the PCIe Devices seem to work.
If there are no devices connected to the first PCIe Switch except of the second cascaded PCIe Switch with its 2 NICs, we see the explained behavior.
Thank you for the patch. Sadly with it, our system does not even boot without the expansion board. It gets stuck and starts rebooting after a while
This is strange. Are you able to update the kernel otherwise? i.e leave the above change, but if you just make some trivial change (say adding an extra print message), are you able to update the kernel? The above change can’t cause system lockup issues for sure.
we are now one step further. A configuration of the PCIe Switch seemed to be the problem.
Now we have a system that is always booting. Sadly another error occurs now. After some random runtime, we get the message “PCIe link lost, device now detached”.
What do we have to do with the signal PEX_L5_RST_N if it is unused? On the Jetson AGX Xavier Module, it has pull-ups to 3.3V. We have just the signals not connected on our carrier board and in our pinmux we set it to unused. Is that correct?
Nope. It can’t be an unused signal. It must go to the downstream device’s PERST. This is a spec defined signal that applies reset to the downstream device.
We finally found a solution. The problem seems to be the active state power management of the pcie switch. With the kernel argument “pcie_aspm=off” we have not seen anymore boot or disconnection problems with our expansion board.
Thank you for your help.