AGX Orin, JP5.1.2 and PCIE Endpoint Troubleshooting

waldman · October 17, 2023, 4:28am

Hi Folks,

We are configuring an AGX Orin with a custom carrier to use PCIE C5 and C7 as endpoints. We are running Jetpack 5.1.2. We have been following the references:

and a host of forum posts. As far as we can tell, we are doing nothing interesting: the PCIE wiring is as close to the devkit wiring as we could make it.

No matter what we have tried, we repeatedly get an error when we configure and start an endpoint:

[    5.866486] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.866493] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517

This error also shows up with C7:

[    5.914234] tegra194-pcie 141e0000.pcie_ep: Failed to get PERST GPIO: -517

In both cases, we have set the pinmux to GPIO input. For example:

                       pex_l7_rst_n_pag1 {
                                nvidia,pins = "pex_l7_rst_n_pag1";
                                nvidia,function = "rsvd1";
                                nvidia,pull = <TEGRA_PIN_PULL_UP>;
                                nvidia,tristate = <TEGRA_PIN_ENABLE>;
                                nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                                nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                                nvidia,lpdr = <TEGRA_PIN_DISABLE>;
                        };

When we probe the L7 RST pin using libgpiod, it is reported as “consumed” by a “reset”. When we disable the pcie_ep device tree entry, the pin is no longer consumed:

	line 157:	"PAG.01"        	input active-low consumer="reset"

Port AG.01 is not consumed anywhere else in the device tree that we can find. We have tried installing the system using USB flash, copying the device tree to /boot/dtb, and OTA flash. Nothing seems to work. The driver fails at this point and the endpoint is never instantiated.

Your suggestions are welcome!

Thank you,
sam

kayccc · October 25, 2023, 2:56am

Sorry for the late response, have you managed to get issue resolved or still need the support? Thanks

waldman · October 25, 2023, 4:47pm

Hi kayccc-

We have not resolved the issue and would appreciate any suggestions you can provide.

Thanks,
Sam

WayneWWW · November 6, 2023, 7:43am

Are you able to test this on Orin AGX Devkit?

waldman · November 8, 2023, 4:55pm

Yes, we are testing it on DevKit now, and will let you know the results.

waldman · November 10, 2023, 5:30am

@WayneWWW

We were able to try endpoint mode with a devkit and ran into the same error. We flashed a devkit with JP5.1.2 and the following key configurations.

In tegra234-p3701-0000-p3737-0000.dts

        pcie_ep@141a0000 {
                status = "okay";
                num-lanes = <8>;
                reset-gpios = <TEGRA234_MAIN_GPIO(AF, 1) GPIO_ACTIVE_LOW>;
                phys = <&p2u_nvhs_0>, <&p2u_nvhs_1>, <&p2u_nvhs_2>, <&p2u_nvhs_3>, 
                          <&p2u_nvhs_4>, <&p2u_nvhs_5>, <&p2u_nvhs_6>, <&p2u_nvhs_7>;
                phy-names = "p2u-0", "p2u-1", "p2u-2", "p2u-3",
                           "p2u-4", "p2u-5", "p2u-6", "p2u-7";  
        };

In the file tegra234-mb1-bct-pinmux-p3701-0000-a04.dtsi we set the reset pin:

                        pex_l5_rst_n_paf1 {
                                nvidia,pins = "pex_l5_rst_n_paf1";
                                nvidia,function = "rsvd1";
                                nvidia,pull = <TEGRA_PIN_PULL_NONE>;
                                nvidia,tristate = <TEGRA_PIN_DISABLE>;
                                nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                                nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                                nvidia,lpdr = <TEGRA_PIN_DISABLE>;
                        };

Checking the padctl configuration after booting we find:

root@orin-840:~# busybox devmem 0x02444008
0x00000071

and checking dmesg we get

root@orin-840:~# dmesg | grep pcie_ep
[    5.958380] tegra194-pcie 141a0000.pcie_ep: Adding to iommu group 9
[    5.972559] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.972570] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517
[    7.268059] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator
[    7.274803] tegra194-pcie 141a0000.pcie_ep: Failed to get slot regulators: -517
[    9.004570] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator

So, in short, we see the same errors using a devkit as we saw with our custom carrier.

Any guidance is welcome,
sam

WayneWWW · November 10, 2023, 5:41am

Hi,

Actually I just wonder another point here. Did you remember to configure the ODMDATA?

waldman · November 10, 2023, 5:56am

Yes, we think so. We added a line to jetson-agx-orin-devkit.conf:

ODMDATA="gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-1,hsio-uphy-config-0,gbe0-enable-10g";

This should set Controller 5 to endpoint, correct? I am always worried about an off-by-one index error.

I should also say that we are working off the notes:

here: Jetson AGX Orin Platform Adaptation and Bring-Up — Jetson Linux Developer Guide documentation
and here: PCIe Endpoint Mode — Jetson Linux Developer Guide documentation

Neither set of notes appears to be complete by themselves.

Thanks,
sam

waldman · November 12, 2023, 1:53am

@WayneWWW

Yes, we flash with the correct ODMDATA setting.
We were able to spend some time with the devkits, and we tried something unusual: we ignored the errors. Even though the console has a handful of errors, the pcie_ep@141a0000 actually initialized correctly.

The Errors:

root@orin-840:~# dmesg | grep pcie_ep
[    5.941382] tegra194-pcie 141a0000.pcie_ep: Adding to iommu group 9
[    5.955351] tegra194-pcie 141a0000.pcie_ep: Failed to get PERST GPIO: -517
[    5.955361] tegra194-pcie 141a0000.pcie_ep: Failed to parse device tree: -517
[    7.227340] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator
[    7.234087] tegra194-pcie 141a0000.pcie_ep: Failed to get slot regulators: -517
[    8.954495] tegra194-pcie 141a0000.pcie_ep: Using GICv2m MSI allocator

However, when we turn on the endpoint and reboot the rootport, the devkit to devkit link works!

BUT. Our custom carrier does not work. Following the exact same procedure, we get errors on the endpoint that look like:

[  326.878577] CPU:0, Error: cbb-fabric@0x13a00000, irq=34
[  326.878668] tegra194-pcie 141e0000.pcie_ep: PCIe controller is not set to EP mode (hdr_typ
e:0x7f)!
[  326.883979] **************************************
[  326.883982] CPU:0, Error:cbb-fabric, Errmon:2
[  326.883991]    Error Code            : TIMEOUT_ERR
[  326.893220] tegra194-pcie 141e0000.pcie_ep: Failed to complete initialization: -5
[  326.898144]    Overflow              : Multiple TIMEOUT_ERR

(Before we had not even tried to start the endpoint because of the console errors).

We have flashed this multiple times in multiple ways (direct flash, over-the-air, etc) and we always get this message. For what its worth, we get the message every time the root port tries to initiate a link. But the link never closes.

Following: Orin C7 PCIe EP set Ethernet Interface over PCIe msi interrupt error, we tried to add the nvidia,host1x field, but it did not appear to change the results.

Any advice is welcome.

waldman · November 12, 2023, 5:47am

We flipped the sense of the ports, turned C5 into the rootport and C7 into the endpoint. We receive the same error on the endpoint.

This changes the sense of the question: any body have an idea what could be causing an error like:

PCIe controller is not set to EP mode (hdr_type:0x7f)!

?

Thanks,
sam

WayneWWW · November 12, 2023, 8:19am

Just to clarify. You should just focus one one endpoint to make it work first.

For example, if you can make it work on devkit C5, then focus on bring up C5 on your board first.
But not keep switching between C5 or C7.

The ODMDATA for C7 is not same for C5 either. I guess you set it wrong again.

waldman · November 12, 2023, 6:13pm

Thank you @WayneWWW.

We changed our configuration to C7 root port and C5 endpoint to match the devkit configuration. We found the same error.

We changed the ODM data appropriately between flashing.

We would like to confirm the ODM data on the running system. Is there a way to read back the ODMDATA for confirmation?

We are also working on backtracing the PCIE_EP startup error message. Do you have any suggestions for what sets that PCIE_EP flag that is reported as incorrect? We are trying to distinguish between potential software configuration issues and hardware issues.

Thank you,

Sam

WayneWWW · November 13, 2023, 5:32am

You can try to share full dmesg instead of just sharing these comments.

Also, enable the full UEFI logs could help clarify. You need to rebuild UEFI from source code to enable full logs.

waldman · November 14, 2023, 3:14pm

Hi @waynewww -

Here are the complete dmesg logs for both the root port and the endpoint. Both have jetson_uefi_DEBUG installed.

The two computers are turned on at the same time. Then the endpoint (orin-052) executes the requisite start_endpoint.sh command which is:

cd /sys/kernel/config/pci_ep
mkdir functions/pci_epf_nv_test/func1
echo 0x10de > functions/pci_epf_nv_test/func1/vendorid
echo 0x0001 > functions/pci_epf_nv_test/func1/deviceid
ln -s functions/pci_epf_nv_test/func1 controllers/141a0000.pcie_ep/
echo 1 > controllers/141a0000.pcie_ep/start

Then we reboot the rootport (orin-260). The rootport log file has two complete boot cycles - the initial boot when they both power on together and the reboot once the endpoint has been configured.

These logs are captured from the serial UART console. We log into the machine through the same serial so that the executed commands are visible in the log files.

What you’ll find:

Both units boot the first, almost identically
The console for orin-052 shows the execution of the start_endpoint.sh script (the script is the commands listed above).
orin-260 reboots
somewhere during the orin-260 firmware boot, orin-052 starts to throw errors to the console at line 2383 (orin-052 time 93.488 second)
this error is repeated every 0.1 seconds for about 11 seconds until 114.491 seconds
This corresponds to orin-260 OS Kernel booting and the message changes

Any insight is welcome!

Thank you,
sam

orin-260_console.txt (288 KB)

orin-052_console.txt (163 KB)

WayneWWW · November 14, 2023, 3:56pm

Just a reminder that it would be better to use the rel-35.4.1 uefi version to match the one you are going to use in the end.

Please also dump the NV devkit endpoint UEFI log with UPHY config is set

waldman · November 16, 2023, 1:51am

@WayneWWW

We figured out a work around to the PCIE problem.

We have a 4-lane physical layer. However the pcie_ep@141a0000 driver gets upset when we only specify four lanes. If we specify all 8 lanes on the end point and let the PCIE PHY layer figure it out, then it works.

Endpoint configured for x4, rootport configured for x4: FAILS
Endpoint configured for x8, rootport configured for x8: WORKS
Endpoint configured for x8, rootport configured for x4: WORKS

We still don’t know why the pcie_ep driver does not work configured for x4 lanes, but we will probably not pursue that for a while.

sam

WayneWWW · November 16, 2023, 2:18am

Hi,

Your software device tree must match the ODMDATA configuration in design guide.

Even if you only have 4 lane, if ODMDATA says it is x8 configuration, then you have to use x8 configuration in DT.

waldman · November 19, 2023, 5:47pm

Thanks @WayneWWW. This is a good solution and works for us. It was not obvious from the documentation we were using. Could this be included in the next documentation update?

system · December 3, 2023, 5:48pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AGX Orin PCIe Endpoint Mode Jetson AGX Orin pcie , jetson	28	78	February 5, 2025
PCIe ep Test Fail on AGX orin:RP DMA address is null .Version:R36.3 Jetson AGX Orin pcie	14	109	November 21, 2024
Shared RAM on PCIe Endpoint Device: 'devmem: mmap:' error Jetson AGX Orin pcie	12	721	December 3, 2023
How to configure pcie endpoint mode on jetson orin nx Jetson Orin NX pcie , board-design	20	197	February 9, 2025
PCIE-EP mode in AGX orin Jetson AGX Orin pcie , board-design	64	382	September 20, 2024
PCI Express EP Mode PWRDOWN_ERR on AGX Orin Jetson AGX Orin pcie , board-design	27	59	March 7, 2025
Device Tree Mods upgrading from AGX Xavier to AGX Orin Jetson AGX Orin device-tree	36	745	March 1, 2024
PCIE c5 连接wifi芯片，无法link up Jetson AGX Orin pcie , board-design , chinese	38	121	August 27, 2024
USB 3.0 PCIe card not working Jetson AGX Orin pcie	21	724	May 7, 2024
PCIE C7 EP not work Jetson AGX Orin pcie	10	793	August 16, 2023

AGX Orin, JP5.1.2 and PCIE Endpoint Troubleshooting

Related topics