I am running Jetpack 4.1 on the Xavier and having troubles with PCIe.
The device I am using shows link activation for only a second or two on the initial boot of the Xavier then it goes down. The boot log doesn’t seem to have anything unusual as far as I can tell. I have tried the Xavier with some more common PCIe devices (Intel X520) which work as expected. The device in question is fine on x86 Ubuntu Linux.
What are the next steps to take to debug why this device does not stay up?
I can’t answer, but you will want to include a verbose lspci. If you run “sudo lspci -vvv 2>&1 | tee log_lspci.txt” you can attach that to your thread (hover your mouse over the quote icon in the upper right, and the paper clip icon will show up for attaching files). If you can do this both before and after the failure it would be best, but after would probably be fine if this is all you can log.
The PCIe error mechanism does not show any errors. Was this lspci before or after failure? If after, then the cause isn’t PCIe, but something further down the chain of drivers.
On the other hand, the end of dmesg shows the AER mechanism is shutting down the bus:
[ 9.421326] aer 0005:00:00.0:pcie002: unloading service driver <b>aer</b>
[ 9.421386] pci_bus 0005:01: busn_res: [bus 01-ff] is released
[ 9.423582] pci_bus 0005:00: busn_res: [bus 00-ff] is released
[ 9.423873] tegra-pcie-dw 141a0000.pcie: PCIe link is not up...!
Someone else may know why the lspci AER shows no error, and then dmesg claims AER as a reason for shutdown. Or maybe I’m just interpreting “unloading service driver aer” incorrectly.
Or maybe I’m just interpreting “unloading service driver aer” incorrectly.
Its wrong interpretation actually. Since there is no PCIe device found, AER service driver which was loaded for root port is getting unloaded as the host controller would shutdown the controller itself. So, this print is expected.
@ c_seymour,
How are you able to say that the link is up momentarily? because from the log, it looks like the PCIe link never came up. BTW, what kind of a PCIe endpoint device is this? Is this based on an FPGA? Also, did you happen to check link up on any other platform (like x86)?
Also, do we have CLKREQ signal routing from your PCIe endpoint to root port here?
Ok. LEDs are just indicating that power is available to the endpoint for a brief amount of time and not really indicating that PCIe link is up briefly. In fact, since the PCIe link didn’t come up within a specified time, power is cut down to the slot resulting in LEDs going off.
Since this is an FPGA based endpoint, I’m suspecting that the time elapsed waiting for PCIe link to come up may be small and hence I feel it is worth increasing the wait time.
Please try the below patch and see if that helps. Here I’m increasing the wait time before going for link up check from 100ms to 5 sec. In case if it doesn’t work with 5 sec delay, play around this value to see if it works for a higher delay.
So following the kernel customization documentation I ran source_sync.sh but I can not find pcie-tegra-dw.c. Do I need to specify a specific tag when doing the source_sync.sh?
So you mentioned JetPack 4.1. Does R31.1 show up from:
head -n 1 /etc/nv_tegra_release
I don’t know if the source_sync.sh command you showed was just abbreviated from what was actually typed, but if not and if using R31.1, then for the kernel code download with source_sync.sh the command would go like this:
Great, we’re using Si53102-A3 clock buffer on the board c_seymour is bringing up so a DC-coupled LVDS input clock shall be fine for it.
I monitored PEX_CLK5_P signal and I’ve noticed that this PCIe clock is briefly enabled on power-on/reset, then disabled while OS boots, then enabled for about 2 ms at some stage of the boot process and then disabled again (presumably because it doesn’t detect PCIe link). Does this 2 ms PCIe clock enable period corresponds to anything in the code?
I’m not sure about the clock being present during power-on/reset, but, during boot, it should be available for around 100 ms and certainly not 2ms. Are you sure that it is 2ms? and also did you measure the frequency of it to be 100 MHz?
I’ll try repeating the measurement to double-check the 2 ms period. Unfortunately we only have a single-ended active probe, not a differential one, but it should be good enough for indicative measurements.
Yes, the frequency was 100 MHz.
I’ve monitored VDD_12V (CH1), PEX_L5_RST_N_R (CH2) and PCIE_REFCLK_P (CH3) on power-on (see attachment).
It doesn’t seem to comply with PCI Express Card Electromechanical Specification, section 2.2: “On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from the power rails achieving specified operating limits”
Scratch the 2 ms thing: it was a spurious output from PCIe clock buffer when it was powered down (when VDD_12V gets disabled with clock input being active).