PCIe Device - No Link

c_seymour · June 20, 2019, 10:30am

I am running Jetpack 4.1 on the Xavier and having troubles with PCIe.

The device I am using shows link activation for only a second or two on the initial boot of the Xavier then it goes down. The boot log doesn’t seem to have anything unusual as far as I can tell. I have tried the Xavier with some more common PCIe devices (Intel X520) which work as expected. The device in question is fine on x86 Ubuntu Linux.

What are the next steps to take to debug why this device does not stay up?

c_seymour · June 20, 2019, 1:59pm

Here is the output from dmesg for the bus with an Intel NIC -

$ dmesg | grep 0005:00
[    7.981355] tegra-pcie-dw 141a0000.pcie: PCI host bridge to bus 0005:00
[    7.981361] pci_bus 0005:00: root bus resource [bus 00-ff]
[    7.981365] pci_bus 0005:00: root bus resource [io  0x300000-0x3fffff] (bus address [0x3a100000-0x3a1fffff])
[    7.981388] pci_bus 0005:00: root bus resource [mem 0x3a200000-0x3bffffff]
[    7.981391] pci_bus 0005:00: root bus resource [mem 0x1c00000000-0x1fffffffff pref]
[    7.981410] pci 0005:00:00.0: [10de:1ad0] type 01 class 0x060400
[    7.981560] pci 0005:00:00.0: PME# supported from D0 D3hot D3cold
[    7.981728] iommu: Adding device 0005:00:00.0 to group 63
[    8.035431] pci 0005:00:00.0: BAR 14: assigned [mem 0x3a200000-0x3a7fffff]
[    8.035434] pci 0005:00:00.0: BAR 13: assigned [io  0x300000-0x300fff]
[    8.035843] pci 0005:00:00.0: PCI bridge to [bus 01-ff]
[    8.035847] pci 0005:00:00.0:   bridge window [io  0x300000-0x300fff]
[    8.035852] pci 0005:00:00.0:   bridge window [mem 0x3a200000-0x3a7fffff]
[    8.035868] pci 0005:00:00.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  512
[    8.036252] pcieport 0005:00:00.0: Signaling PME through PCIe PME interrupt
[    8.036260] pcie_pme 0005:00:00.0:pcie001: service driver pcie_pme loaded
[    8.036418] aer 0005:00:00.0:pcie002: service driver aer loaded

and here is the output for my device -

$ dmesg | grep 0005:00
[    8.539247] tegra-pcie-dw 141a0000.pcie: PCI host bridge to bus 0005:00
[    8.540642] pci_bus 0005:00: root bus resource [bus 00-ff]
[    8.542074] pci_bus 0005:00: root bus resource [io  0x300000-0x3fffff] (bus address [0x3a100000-0x3a1fffff])
[    8.543549] pci_bus 0005:00: root bus resource [mem 0x3a200000-0x3bffffff]
[    8.544928] pci_bus 0005:00: root bus resource [mem 0x1c00000000-0x1fffffffff pref]
[    8.546414] pci 0005:00:00.0: [10de:1ad0] type 01 class 0x060400
[    8.546600] pci 0005:00:00.0: PME# supported from D0 D3hot D3cold
[    8.546812] iommu: Adding device 0005:00:00.0 to group 63
[    8.548389] pci 0005:00:00.0: PCI bridge to [bus 01-ff]
[    8.549792] pci 0005:00:00.0: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  512
[    8.551485] pcieport 0005:00:00.0: Signaling PME through PCIe PME interrupt
[    8.552896] pcie_pme 0005:00:00.0:pcie001: service driver pcie_pme loaded
[    8.553006] aer 0005:00:00.0:pcie002: service driver aer loaded
[    8.553178] pcie_pme 0005:00:00.0:pcie001: unloading service driver pcie_pme
[    8.553225] aer 0005:00:00.0:pcie002: unloading service driver aer
[    8.553386] iommu: Removing device 0005:00:00.0 from group 63
[    8.554841] pci_bus 0005:00: busn_res: [bus 00-ff] is released

dmesg.log (8.43 KB)
lspci.log (6.6 KB)

linuxdev · June 20, 2019, 9:22pm

I can’t answer, but you will want to include a verbose lspci. If you run “sudo lspci -vvv 2>&1 | tee log_lspci.txt” you can attach that to your thread (hover your mouse over the quote icon in the upper right, and the paper clip icon will show up for attaching files). If you can do this both before and after the failure it would be best, but after would probably be fine if this is all you can log.

vidyas · June 21, 2019, 4:55am

Also, can you please share all lines w.r.t PCIe in the log? “dmesg | grep -i pci” ??

c_seymour · June 21, 2019, 7:28am

Both logs are attached to my previous post but they don’t seem to shed any more light on what is going on?

linuxdev · June 21, 2019, 7:32pm

The PCIe error mechanism does not show any errors. Was this lspci before or after failure? If after, then the cause isn’t PCIe, but something further down the chain of drivers.

On the other hand, the end of dmesg shows the AER mechanism is shutting down the bus:

[    9.421326] aer 0005:00:00.0:pcie002: unloading service driver <b>aer</b>
[    9.421386] pci_bus 0005:01: busn_res: [bus 01-ff] is released
[    9.423582] pci_bus 0005:00: busn_res: [bus 00-ff] is released
[    9.423873] tegra-pcie-dw 141a0000.pcie: PCIe link is not up...!

Someone else may know why the lspci AER shows no error, and then dmesg claims AER as a reason for shutdown. Or maybe I’m just interpreting “unloading service driver aer” incorrectly.

vidyas · June 24, 2019, 9:33am

Or maybe I’m just interpreting “unloading service driver aer” incorrectly.
Its wrong interpretation actually. Since there is no PCIe device found, AER service driver which was loaded for root port is getting unloaded as the host controller would shutdown the controller itself. So, this print is expected.

@ c_seymour,
How are you able to say that the link is up momentarily? because from the log, it looks like the PCIe link never came up. BTW, what kind of a PCIe endpoint device is this? Is this based on an FPGA? Also, did you happen to check link up on any other platform (like x86)?
Also, do we have CLKREQ signal routing from your PCIe endpoint to root port here?

c_seymour · June 24, 2019, 10:15am

How are you able to say that the link is up momentarily?

The LEDs on the PCIe adapter card are green for ~10 seconds until the bus is shutdown.

What kind of a PCIe endpoint device is this

Yes, FPGA and working fine on x86 Ubuntu Linux

Do we have CLKREQ signal routing from your PCIe endpoint to root port here?

Yes.

Sorry there isn’t much information to go on but I’m stumped.

vidyas · June 24, 2019, 11:13am

Ok. LEDs are just indicating that power is available to the endpoint for a brief amount of time and not really indicating that PCIe link is up briefly. In fact, since the PCIe link didn’t come up within a specified time, power is cut down to the slot resulting in LEDs going off.
Since this is an FPGA based endpoint, I’m suspecting that the time elapsed waiting for PCIe link to come up may be small and hence I feel it is worth increasing the wait time.
Please try the below patch and see if that helps. Here I’m increasing the wait time before going for link up check from 100ms to 5 sec. In case if it doesn’t work with 5 sec delay, play around this value to see if it works for a higher delay.

diff --git a/drivers/pci/host/pcie-tegra-dw.c b/drivers/pci/host/pcie-tegra-dw.c
index 63ec46b3430b..4dcb089a2ed1 100644
--- a/drivers/pci/host/pcie-tegra-dw.c
+++ b/drivers/pci/host/pcie-tegra-dw.c
@@ -2351,7 +2351,7 @@ static void tegra_pcie_dw_host_init(struct pcie_port *pp)
        val |= APPL_PINMUX_PEX_RST;
        writel(val, pcie->appl_base + APPL_PINMUX);
 
-       msleep(100);
+       msleep(5000);
 
        val = readl(pp->dbi_base + CFG_LINK_STATUS_CONTROL);
        while (!(val & CFG_LINK_STATUS_DLL_ACTIVE)) {

c_seymour · June 25, 2019, 9:09am

So following the kernel customization documentation I ran source_sync.sh but I can not find pcie-tegra-dw.c. Do I need to specify a specific tag when doing the source_sync.sh?

$ ./source_sync.sh
...
$ find -name pcie-tegra-dw.c
$ tree ./rootfs/usr/src/linux-headers-4.9.108-tegra/drivers/pci/
./rootfs/usr/src/linux-headers-4.9.108-tegra/drivers/pci/
├── host
│   ├── Kconfig
│   └── Makefile
├── hotplug
│   ├── Kconfig
│   └── Makefile
├── Kconfig
├── Makefile
└── pcie
    ├── aer
    │   ├── Kconfig
    │   ├── Kconfig.debug
    │   └── Makefile
    ├── Kconfig
    └── Makefile

4 directories, 11 files

I should also note that the FPGA is externally powered and initialized before the Jetson boots.

cioma · June 25, 2019, 2:26pm

Is there a way to keep VDD_12V power rail (which powers PCIe x16 slot) enabled even if OS doesn’t detect PCIe link?

cioma · June 25, 2019, 3:50pm

And what is the output type of the PEX_CLK5_P/N signals on SoC (e.g. HCSL, LP-HCSL, LVDS etc)?

linuxdev · June 25, 2019, 8:28pm

So you mentioned JetPack 4.1. Does R31.1 show up from:

head -n 1 /etc/nv_tegra_release

I don’t know if the source_sync.sh command you showed was just abbreviated from what was actually typed, but if not and if using R31.1, then for the kernel code download with source_sync.sh the command would go like this:

./source_sync.sh -k tegra-l4t-r31.1

vidyas · June 26, 2019, 4:04am

Following patch can be used to keep 12V slot power flowing to slot even if PCIe link is not up

diff --git a/drivers/pci/host/pcie-tegra-dw.c b/drivers/pci/host/pcie-tegra-dw.c
index 63ec46b3430b..6e2ea926d4e9 100644
--- a/drivers/pci/host/pcie-tegra-dw.c
+++ b/drivers/pci/host/pcie-tegra-dw.c
@@ -3082,7 +3082,7 @@ static int tegra_pcie_dw_runtime_suspend(struct device *dev)
        reset_control_assert(pcie->core_apb_rst);
        clk_disable_unprepare(pcie->core_clk);
        regulator_disable(pcie->pex_ctl_reg);
-       config_plat_gpio(pcie, 0);
+       //config_plat_gpio(pcie, 0);

        if (pcie->cid != CTRL_5)
                uphy_bpmp_pcie_controller_state_set(pcie->cid, false);

vidyas · June 26, 2019, 4:05am

It is LVDS

cioma · June 26, 2019, 9:11am

Great, we’re using Si53102-A3 clock buffer on the board c_seymour is bringing up so a DC-coupled LVDS input clock shall be fine for it.

I monitored PEX_CLK5_P signal and I’ve noticed that this PCIe clock is briefly enabled on power-on/reset, then disabled while OS boots, then enabled for about 2 ms at some stage of the boot process and then disabled again (presumably because it doesn’t detect PCIe link). Does this 2 ms PCIe clock enable period corresponds to anything in the code?

vidyas · June 26, 2019, 9:17am

I’m not sure about the clock being present during power-on/reset, but, during boot, it should be available for around 100 ms and certainly not 2ms. Are you sure that it is 2ms? and also did you measure the frequency of it to be 100 MHz?

cioma · June 26, 2019, 9:23am

I’ll try repeating the measurement to double-check the 2 ms period. Unfortunately we only have a single-ended active probe, not a differential one, but it should be good enough for indicative measurements.
Yes, the frequency was 100 MHz.

vidyas · June 26, 2019, 9:36am

If this is really 2ms, then, there is something really wrong. As I mentioned, it has to be around 100ms.

cioma · June 26, 2019, 12:52pm

I’ve monitored VDD_12V (CH1), PEX_L5_RST_N_R (CH2) and PCIE_REFCLK_P (CH3) on power-on (see attachment).

It doesn’t seem to comply with PCI Express Card Electromechanical Specification, section 2.2: “On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from the power rails achieving specified operating limits”

Scratch the 2 ms thing: it was a spurious output from PCIe clock buffer when it was powered down (when VDD_12V gets disabled with clock input being active).

Topic		Replies	Views
Jetson AGX Xavier Pcie(Root) detection of Device needs jetson reboot Jetson AGX Xavier pcie , fpga	25	2166	July 12, 2023
Xavier SOM PCIe can't detect any device on custom board Jetson AGX Xavier pcie , board-design	4	1207	October 18, 2021
Pcie clk Jetson Xavier NX pcie	12	1878	December 8, 2021
PCIe not being recognized - TX2 Jetson TX2	20	3154	March 16, 2020
no PCIe link with some devices Jetson AGX Xavier	18	3580	October 18, 2021
PCI Express card not being detected on boot occasionally Jetson AGX Xavier pcie , boot	11	2545	October 18, 2021
TX1 <-> FPGA through PCIE Jetson TX1	42	15557	June 3, 2016
Occasionally `14160000.pcie: Phy link never came up` Jetson Orin NX pcie , board-design	26	1388	June 10, 2024
PCIe to i210 not working with JetPack 5.1.2 Jetson AGX Xavier pcie	21	574	January 29, 2024
Use PCIe to communicate between two Xaviers: RP phy dosen't up Jetson AGX Xavier	17	2135	October 6, 2021

PCIe Device - No Link

Related topics