Two orin agx pcie connect failed

jetson linux 36.4.4, use one orin C5 as RP, another orin C5 as EP,

soc1:ODMDATA=“gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-0,hsio-uphy-config-0,gbe0-enable-10g”;

soc2: ODMDATA=“gbe-uphy-config-22,nvhs-uphy-config-

1,hsio-uphy-config-0,gbe0-enable-10g,hsstp-lane-map-3”;

I apply the patch to Disabling PCIe power down on soc1 and lspci show:

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1)

soc2 lspci:

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 229e (rev a1)
0001:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO

soc1 dts:

	pcie@14160000 {
		status = "disabled";
	};

	pcie@14100000 {
		status = "okay";
	};

	pcie@141a0000 {
		status = "okay";			
	};

	pcie-ep@141a0000 {
		status = "disabled";
	};

soc2 dts:

	pcie@14160000 {
		status = "disabled";
	};

	pcie@14100000 {
		status = "okay";
	};

	pcie@141a0000 {
		status = "disabled";			
	};

	pcie-ep@141a0000 {
		status = "okay";
	};

two orin dmesg logs.zip (31.5 KB)

please check the logs, thanks

*** Please note that this reply is generated by LLM automatically ***

Based on the provided context, it appears that you are trying to configure two Jetson AGX Orin modules, each with a PCIe controller in a different mode. One module has a PCIe controller in Root Port (RP) mode, and the other has a PCIe controller in Endpoint (EP) mode.

The ODMDATA values you provided are used to configure the PCIe controllers on the Jetson AGX Orin modules. The values are:

  • soc1: ODMDATA="gbe-uphy-config-22,nvhs-uphy-config-0,hsio-uphy-config-0,gbe0-enable-10g,hss-tp-lane-map-3"
  • soc2: ODMDATA="gbe-uphy-config-22,nvhs-uphy-config-1,hsio-uphy-config-0,gbe0-enable-10g,hss-tp-lane-map-3"

The main difference between the two ODMDATA values is the nvhs-uphy-config value. On soc1, it is set to 0, which corresponds to the default configuration, while on soc2, it is set to 1, which corresponds to a custom configuration.

The PCIe controller on soc1 is configured as a Root Port, and the lspci output shows two PCI bridges (0001:00:00.0 and 0005:00:00.0) with NVIDIA Corporation Device 229e and 229a, respectively.

The PCIe controller on soc2 is configured as an Endpoint, and the lspci output shows a Non-Volatile memory controller (0001:01:00.0) with Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3980PRO.

The dmesg logs you provided are not included in the text, but based on the lspci output, it appears that the PCIe controllers are configured correctly.

To answer your question, it seems that you have successfully configured one Jetson AGX Orin module with a PCIe controller in Root Port mode and another module with a PCIe controller in Endpoint mode. However, without the dmesg logs, it is difficult to provide a more detailed analysis of the system’s behavior.

In comparison to other Jetson modules, the Jetson AGX Orin series has a more complex PCIe configuration due to the presence of multiple PCIe controllers and the ability to configure them in different modes. The Jetson Xavier NX series, for example, has a simpler PCIe configuration with only one PCIe controller.

In terms of PCIe power management, the Jetson AGX Orin series has more advanced features, such as the ability to configure the PCIe controllers to transition to the lowest power state in Deep Sleep (LP0) and Suspend (LP1) modes. This is not explicitly mentioned in the provided context, but it is an important consideration when designing systems with PCIe interfaces.

Overall, the provided context suggests that you have successfully configured the PCIe controllers on the Jetson AGX Orin modules, but a more detailed analysis of the system’s behavior would require additional information, such as the dmesg logs.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

I zip the dmesg log to one file, because I upload two txt file failed.

How is the wiring happened between the 2 Orin? Did you follow the design guide to make the cable?

only change C5 X8 to X4, and soc1 to soc2 PEX_L5_RST_N :

it’s already OK based on jetson 35.3.1, but failed on jetson 36.4.4

layout on one board, not use cable

I mean how did you connect the 2 Orin board here?

On what EP driver are you running there? Is that tegra vnet?

yes,follow the guide

it’s our owner designed carrier board, 2 orin module on one board

please make sure RP is boot up later than EP

I reboot soc1 after EP is configured to test RP, also try to rescan

We did same test on rel-36.4.3 and it was working. Please review above post.

I already check the post, but can’t fix the issue, I want to know how to debug it from logs or some software side, such as how to confirm ODMDATA and power supply, because hardware is verified OK

Actually I don’t think software side has much to check.
If your UPHY setting is wrong, then even boot up would have problem. dmesg would also give you uphy error with error no -19. But those didn’t happen on your dmesg.

refer to document for the UPHY checking

sudo cat /sys/kernel/debug/bpmp/debug/uphy/config

after debug, there is a great process, but also there are two issue:
1)ethernet name is different
2)RP eth device enumerate too slow

at EP side run

at RP side run “modprobe tegra_vnet”, lspci
img_v3_02vf_49ca41dd-93b8-4e4e-8667-acf388c7355g

ethernet name is different,
RP:


EP:

and RP ethernet show quickly but EP is very slow, about several minutes how to fix the delay on EP