AGX Orin, JP5.1.2 and PCIE Endpoint Troubleshooting

Hi Folks,

We are configuring an AGX Orin with a custom carrier to use PCIE C5 and C7 as endpoints. We are running Jetpack 5.1.2. We have been following the references:

and a host of forum posts. As far as we can tell, we are doing nothing interesting: the PCIE wiring is as close to the devkit wiring as we could make it.

No matter what we have tried, we repeatedly get an error when we configure and start an endpoint:

[    5.866486] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.866493] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517

This error also shows up with C7:

[    5.914234] tegra194-pcie 141e0000.pcie_ep: Failed to get PERST GPIO: -517

In both cases, we have set the pinmux to GPIO input. For example:

                       pex_l7_rst_n_pag1 {
                                nvidia,pins = "pex_l7_rst_n_pag1";
                                nvidia,function = "rsvd1";
                                nvidia,pull = <TEGRA_PIN_PULL_UP>;
                                nvidia,tristate = <TEGRA_PIN_ENABLE>;
                                nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                                nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                                nvidia,lpdr = <TEGRA_PIN_DISABLE>;
                        };

When we probe the L7 RST pin using libgpiod, it is reported as “consumed” by a “reset”. When we disable the pcie_ep device tree entry, the pin is no longer consumed:

	line 157:	"PAG.01"        	input active-low consumer="reset"

Port AG.01 is not consumed anywhere else in the device tree that we can find. We have tried installing the system using USB flash, copying the device tree to /boot/dtb, and OTA flash. Nothing seems to work. The driver fails at this point and the endpoint is never instantiated.

Your suggestions are welcome!

Thank you,
sam

Sorry for the late response, have you managed to get issue resolved or still need the support? Thanks

Hi kayccc-

We have not resolved the issue and would appreciate any suggestions you can provide.

Thanks,
Sam

Are you able to test this on Orin AGX Devkit?

Yes, we are testing it on DevKit now, and will let you know the results.

@WayneWWW

We were able to try endpoint mode with a devkit and ran into the same error. We flashed a devkit with JP5.1.2 and the following key configurations.

In tegra234-p3701-0000-p3737-0000.dts

        pcie_ep@141a0000 {
                status = "okay";
                num-lanes = <8>;
                reset-gpios = <TEGRA234_MAIN_GPIO(AF, 1) GPIO_ACTIVE_LOW>;
                phys = <&p2u_nvhs_0>, <&p2u_nvhs_1>, <&p2u_nvhs_2>, <&p2u_nvhs_3>, 
                          <&p2u_nvhs_4>, <&p2u_nvhs_5>, <&p2u_nvhs_6>, <&p2u_nvhs_7>;
                phy-names = "p2u-0", "p2u-1", "p2u-2", "p2u-3",
                           "p2u-4", "p2u-5", "p2u-6", "p2u-7";  
        };      

In the file tegra234-mb1-bct-pinmux-p3701-0000-a04.dtsi we set the reset pin:

                        pex_l5_rst_n_paf1 {
                                nvidia,pins = "pex_l5_rst_n_paf1";
                                nvidia,function = "rsvd1";
                                nvidia,pull = <TEGRA_PIN_PULL_NONE>;
                                nvidia,tristate = <TEGRA_PIN_DISABLE>;
                                nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                                nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                                nvidia,lpdr = <TEGRA_PIN_DISABLE>;
                        };

Checking the padctl configuration after booting we find:

root@orin-840:~# busybox devmem 0x02444008
0x00000071

and checking dmesg we get

root@orin-840:~# dmesg | grep pcie_ep
[    5.958380] tegra194-pcie 141a0000.pcie_ep: Adding to iommu group 9
[    5.972559] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.972570] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517
[    7.268059] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator
[    7.274803] tegra194-pcie 141a0000.pcie_ep: Failed to get slot regulators: -517
[    9.004570] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator

So, in short, we see the same errors using a devkit as we saw with our custom carrier.

Any guidance is welcome,
sam

Hi,

Actually I just wonder another point here. Did you remember to configure the ODMDATA?

Yes, we think so. We added a line to jetson-agx-orin-devkit.conf:

ODMDATA="gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-1,hsio-uphy-config-0,gbe0-enable-10g";

This should set Controller 5 to endpoint, correct? I am always worried about an off-by-one index error.

I should also say that we are working off the notes:

Neither set of notes appears to be complete by themselves.

Thanks,
sam

@WayneWWW

Yes, we flash with the correct ODMDATA setting.
We were able to spend some time with the devkits, and we tried something unusual: we ignored the errors. Even though the console has a handful of errors, the pcie_ep@141a0000 actually initialized correctly.

The Errors:

root@orin-840:~# dmesg | grep pcie_ep
[    5.941382] tegra194-pcie 141a0000.pcie_ep: Adding to iommu group 9
[    5.955351] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.955361] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517
[    7.227340] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator
[    7.234087] tegra194-pcie 141a0000.pcie_ep: Failed to get slot regulators: -517
[    8.954495] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator

However, when we turn on the endpoint and reboot the rootport, the devkit to devkit link works!

BUT. Our custom carrier does not work. Following the exact same procedure, we get errors on the endpoint that look like:

[  326.878577] CPU:0, Error: cbb-fabric@0x13a00000, irq=34
[  326.878668] tegra194-pcie 141e0000.pcie_ep: PCIe controller is not set to EP mode (hdr_typ
e:0x7f)!
[  326.883979] **************************************
[  326.883982] CPU:0, Error:cbb-fabric, Errmon:2
[  326.883991]    Error Code            : TIMEOUT_ERR
[  326.893220] tegra194-pcie 141e0000.pcie_ep: Failed to complete initialization: -5
[  326.898144]    Overflow              : Multiple TIMEOUT_ERR

(Before we had not even tried to start the endpoint because of the console errors).

We have flashed this multiple times in multiple ways (direct flash, over-the-air, etc) and we always get this message. For what its worth, we get the message every time the root port tries to initiate a link. But the link never closes.

Following: Orin C7 PCIe EP set Ethernet Interface over PCIe msi interrupt error, we tried to add the nvidia,host1x field, but it did not appear to change the results.

Any advice is welcome.

We flipped the sense of the ports, turned C5 into the rootport and C7 into the endpoint. We receive the same error on the endpoint.

This changes the sense of the question: any body have an idea what could be causing an error like:

PCIe controller is not set to EP mode (hdr_type:0x7f)!

?

Thanks,
sam

Just to clarify. You should just focus one one endpoint to make it work first.

For example, if you can make it work on devkit C5, then focus on bring up C5 on your board first.
But not keep switching between C5 or C7.

The ODMDATA for C7 is not same for C5 either. I guess you set it wrong again.

Thank you @WayneWWW.

We changed our configuration to C7 root port and C5 endpoint to match the devkit configuration. We found the same error.

We changed the ODM data appropriately between flashing.

We would like to confirm the ODM data on the running system. Is there a way to read back the ODMDATA for confirmation?

We are also working on backtracing the PCIE_EP startup error message. Do you have any suggestions for what sets that PCIE_EP flag that is reported as incorrect? We are trying to distinguish between potential software configuration issues and hardware issues.

Thank you,

Sam

You can try to share full dmesg instead of just sharing these comments.

Also, enable the full UEFI logs could help clarify. You need to rebuild UEFI from source code to enable full logs.

Hi @waynewww -

Here are the complete dmesg logs for both the root port and the endpoint. Both have jetson_uefi_DEBUG installed.

The two computers are turned on at the same time. Then the endpoint (orin-052) executes the requisite start_endpoint.sh command which is:

cd /sys/kernel/config/pci_ep
mkdir functions/pci_epf_nv_test/func1
echo 0x10de > functions/pci_epf_nv_test/func1/vendorid
echo 0x0001 > functions/pci_epf_nv_test/func1/deviceid
ln -s functions/pci_epf_nv_test/func1 controllers/141a0000.pcie_ep/
echo 1 > controllers/141a0000.pcie_ep/start

Then we reboot the rootport (orin-260). The rootport log file has two complete boot cycles - the initial boot when they both power on together and the reboot once the endpoint has been configured.

These logs are captured from the serial UART console. We log into the machine through the same serial so that the executed commands are visible in the log files.

What you’ll find:

  • Both units boot the first, almost identically
  • The console for orin-052 shows the execution of the start_endpoint.sh script (the script is the commands listed above).
  • orin-260 reboots
  • somewhere during the orin-260 firmware boot, orin-052 starts to throw errors to the console at line 2383 (orin-052 time 93.488 second)
  • this error is repeated every 0.1 seconds for about 11 seconds until 114.491 seconds
  • This corresponds to orin-260 OS Kernel booting and the message changes

Any insight is welcome!

Thank you,
sam

orin-260_console.txt (288 KB)

orin-052_console.txt (163 KB)

Just a reminder that it would be better to use the rel-35.4.1 uefi version to match the one you are going to use in the end.

Please also dump the NV devkit endpoint UEFI log with UPHY config is set

@WayneWWW

We figured out a work around to the PCIE problem.

We have a 4-lane physical layer. However the pcie_ep@141a0000 driver gets upset when we only specify four lanes. If we specify all 8 lanes on the end point and let the PCIE PHY layer figure it out, then it works.

  • Endpoint configured for x4, rootport configured for x4: FAILS
  • Endpoint configured for x8, rootport configured for x8: WORKS
  • Endpoint configured for x8, rootport configured for x4: WORKS

We still don’t know why the pcie_ep driver does not work configured for x4 lanes, but we will probably not pursue that for a while.

sam

Hi,

Your software device tree must match the ODMDATA configuration in design guide.

Even if you only have 4 lane, if ODMDATA says it is x8 configuration, then you have to use x8 configuration in DT.

Thanks @WayneWWW. This is a good solution and works for us. It was not obvious from the documentation we were using. Could this be included in the next documentation update?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.