Hi,
We are very new to this technology and we are trying to learn. We have an NVIDIA Jetson AGX Xavier flashed in Endpoint Mode as explained in Jetson AGX Xavier PCIe Endpoint Mode — Jetson Linux<br/>Developer Guide 34.1 documentation), and we are trying to connect it to a x64 CPU PC as the Root Complex.
We are using NVIDIA Jetson Linux 35.6.0 in the Jetson and Ubuntu 18.04 in the PC.
The Jetson board works without connecting the PCIe to the PC, but it turns off just a few seconds after booting when it is connected to the PC via PCIe. We are using the cable ADT-Link PCI express 3.0 x4 Jumper Cable R22NS (PCIe x4 Jumpers Extension Cable). Sometimes there is not even time to login.
As a side note, we have to turn on the PC but we don’t let the OS to boot until we turn on the Jetson. Otherwise, the Jetson does not even turn on. The Jetson is always externally connected to power by the power supply included in the kit.
Any clue is welcome,
Thank you.
Sorry that I am not quite in the situation.
Are you saying that your Jetson is not able to boot up if there is PCIe connection with host PC?
Yes, that is the case. The are three different situations:
- The PC and the Jetson are not connected via PCIe. In this case, both works as expected.
- The Jetson is connected to the host PC via PCIe and the PC is not turned on. In this case, the Jetson is not able to boot up. We see the power led turned on for only a few seconds. We see some movement in the PC’s fans, so we think the Jetson is trying to turn on the PC.
- The Jetson is connected to the host PC via PCIe and the PC is turned on. In this case, the Jetson is able to boot up, but it turns off a few seconds after the Linux user login. We don’t see anything in the journalctl or any other log we have checked.
Thank you for your answer.
Ok. So none of the case here is “reboot”. They are all powered down?
Correct, we power off both the PC and the Jetson and disconnect the PCIe cable between any of our tries to make it work.
Also, please refer to Jetson AGX Xavier Series PCIe Endpoint Design Guidelines Application Note and Jetson AGX Xavier PCIe Endpoint Software available through search in Jetson download center for further guidance Jetson Download Center | NVIDIA Developer
We have checked both documents. We cannot go beyond step 1.4 PCI Endpoint Software due to the issue we are describing.
Regarding the Design Guidelines, the cable we linked above should be doing all the Tx-Rx connections. The only aspect we are not sure about is this section:
“The mux should be set to select PEX_CLK5_N/P if the Jetson AGX Xavier will be the Root Port or NVHS_SLVS_REFCLK_P/N if it will be the Endpoint”
Could a misconfiguration of that causes our issue? Could you provide further guidance on how to make sure the mux selects NVHS_SLVS_REFCLK_P/N? Are we missing something about the connection or the mux configuration?
Refer to Xavier Developer Kit carrier Board schematics. PCIe clock Mux is controlled through GPIO6_PEX_REFCLK_SEL and Mux truth table is in schematics.
Jetson Download Center | NVIDIA Developer
We have selected NVHS_SLVS_REFCLK_P by the following process and we still have the same issue.
- First, we generated a dst files from the NVIDIA Jetson Xavier Pinmux spreadsheet, modifying the Req Initial State of the row with description “PEX_REFCLK_SEL” from Drive 0 to Drive 1.
- Then, we generated a config file by using the dst files as input for the Python script present Kernel pinmux folder.
- After that, we move the generated config file to the BTC folder in the bootloader repository.
- Finally, we flashed the Jetson by using the flash script.
The behavior is the same and the Jetson is turned off when is connected to the PC.
Are we missing something?
Actually you don’t need to do that. Our default software already did everything that you should only need to update the ODMDATA and software side will handle the pinmux and dtb by itself.
Unless you are totally not using our default dtb.
We are following the instructions detailed in Jetson AGX Xavier PCIe Endpoint Mode — Jetson Linux<br/>Developer Guide 34.1 documentation for a Jetson AGX Xavier with NVIDIA Jetson Linux 35.6.0.
All the software we are using have been downloaded from NVIDIA website and have not been modified.
Just to clarify that the document should be this one.
If there is no error in the dmesg, then software is probably fine.
We have rolled back all changes and only modified jetson-xavier.conf to specify ODMDATA as the document states. We get the same behavior.
I attach the output of dmesg without connecting the Jetson to the PC (the Jetson is turned off before I can open a terminal when it is connected to the PC via PCIe).
dmesg.txt (79.9 KB)
I see the following lines in the output of dmesg, which can be a problem:
[ 4.425550] tegra194-pcie 141a0000.pcie_ep: Adding to iommu group 8
[ 4.427781] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[ 4.427796] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517
Is that expected by following these instructions using all the default software? Should we try an older version of Jetson Linux?
Hi,
This error is actually weird. Could you check if your device tree has “reset-gpios” under pcie_ep@141a0000? I read the default one and it is indeed there.
Yes, it is there. The content is the following:
reset-gpios = <0x0b 0xd9 0x01>;
I can provide the device tree if needed, we have not made any change on it.
Then could you go to your driver and check why this line cannot get the reset gpios?
We have not implemented any driver. What driver are you referring to?
Above “141a0000.pcie_ep” is the driver we provided but we don’t see any of such print on our side when we tested devkit. Also, you mentioned “reset-gpios” are there in your device tree.
If device tree has it, then such error print shall never happen. That is why I asked you to go to that driver and check why this thing got error even though GPIO node is present.
The driver is kernel/kernel-5.10/drivers/pci/controller/dwc/pcie-tegra194.c
I see the following code inside the file you pointed to, in the function tegra_pcie_dw_parse_dt
, which is the only place where that error message is present:
pcie->pex_rst_gpiod = devm_gpiod_get(pcie->dev, "reset", GPIOD_IN);
if (IS_ERR(pcie->pex_rst_gpiod)) {
int err = PTR_ERR(pcie->pex_rst_gpiod);
const char *level = KERN_ERR;
if (err == -EPROBE_DEFER)
level = KERN_DEBUG;
dev_printk(level, pcie->dev,
dev_fmt("Failed to get PERST GPIO: %d\n"),
err);
return err;
}
The function is called from the probe function in the platform_driver
struct tegra_pcie_dw_driver
. The error message in case of failure confirm this is the code we are looking for:
ret = tegra_pcie_dw_parse_dt(pcie);
if (ret < 0) {
const char *level = KERN_ERR;
if (ret == -EPROBE_DEFER)
level = KERN_DEBUG;
dev_printk(level, dev,
dev_fmt("Failed to parse device tree: %d\n"),
ret);
}
Checking devm_gpio_get_index
, the error can only come from gpiod_get_index
. It is not easy for us to see where the issue comes from in gpiod_get_index
without those dev_dbg
logs. We are using everything by default, we have not modified the kernel. Should we check any specific code? Enabling debug logging? In that case, how?
Hi,
I just checked this again but still sounds not reasonable to me.
The error no here is -517. It means it is “EPROBE_DEFER”. So this error here is because the PCIe driver is probed too early. It is earlier than the GPIO driver.
If you read your dmesg, you would notice GPIO driver starts later then the pcie ep driver.
However, when the err is EPROBE_DEFER, the log level should be set to KERN_DEBUG and your dev_printk shall not print it unless you ever changed the loglevel of your dmesg.
Also, if this is EPROBE_DEFER, then kernel shall probe pcie driver again later. However, it seems not happening on your side.