Getting PCIe RX Error when attempting to test PCIe on Jetson AGX Xavier on Jetpack 5.1

Hi. I am attempting to test PCIe using two Jetson AGX Xaviers on Jetpack 5.1. However, I have run into issues, and need help.

I am currently following this guide:
https://docs.nvidia.com/jetson/archives/r35.2.1/DeveloperGuide/text/SD/Communications/PcieEndpointMode.html

However, when I get to the step where I need to boot the root port device, I run into an issue. When I boot the root port, I get the following error message repeatedly on terminal:

[   71.231818] pcieport 0005:00:00.0:    [ 0] RxErr
[   71.240052] pcieport 0005:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[   71.247907] pcieport 0005:00:00.0:   device [10de:1ad0] error status/mask=00000001/0000e000

I have to remove the PCIe cable for the error messages to stop, and the board to boot up. However, when I connect the PCIe cable again, the same error once again re-appears, and repeats itself till I remove the PCIe cable.

I have tried disabling ASPM by running the following command on root port:
echo “performance” > /sys/module/pcie_aspm/parameters/policy

However, as soon as I plug PCIe cable after running this command, the same error still comes up.

I have also tried reducing the speed to Gen1 by modifying the device tree, but I still get the same error.

Even though I have modified the device tree, I am unsure whether the device tree change actually did take effect or not. How do I verify that the speed is indeed Gen1?

I made the following changes to the BSP sources:

  1. changed nvidia,max-speed property of pcie@141a0000 and pcie_ep@141a0000 from 4 to 1 in the Linux_for_Tegra/source/public/hardware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi file,
  2. Changed CONFIG_PCIEASPM_POWER_SUPERSAVE=y to CONFIG_PCIEASPM_POWER_SUPERSAVE=n in the Linux_for_Tegra/source/public/kernel/kernel-5.10/arch/arm64/configs/tegra_defconfig file.

Then I recompiled the kernel, replaced the required files in the Jetpack 5.1 BSP files, applied binaries, and then flashed the root port and end point devices using the flash.sh script.

I tried running the following on the Root Port:
cat /proc/device-tree/pcie@141a0000/nvidia\,max-speed

But this results in no output.

How do I verify that the speed is indeed Gen1? Did I miss anything when setting speed to Gen1? And how do I resolve the original error? Any help would be much appreciated.

The UART log for root port is attached. Kindly let me know if any more info is needed.
Root_Port_UART_Log.txt (208.4 KB)

EDIT: I also get the following error repeatedly before the PCIe RX Error mentioned above.

[    9.080027] pcieport 0005:00:00.0: AER: can't find device of ID0000
[    9.080030] pcieport 0005:00:00.0: AER: Corrected error received: 0005:00:00.0

Regards,
Sana Ur Rehman

Anyone? Any ideas?

I eventually managed to resolve the issue by lowering the speed to Gen1. The issue was caused by bad cable quality (signal integrity issues in the hardware link). Reducing speed to Gen1 resolved the issue. To reduce the speed, you need to change the “nvidia,max-speed = <4>” property, and also add the “max-link-speed” property. (I was previously only editing the “nvidia-max-speed” property, which alone is not sufficient to reduce the link speed) For example for Gen1, the following worked for me:

nvidia,max-speed = <1>;
max-link-speed = <1>;

Hope this helps anyone else who is having the same problem.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.