Jetson Orin, JP5.1.2 and PCIE EP(c5) Error

Hi, all:
We are debugging a custom carrier with two ORins, They are connected to PCiE C5. We are running Jetpack 5.1.2. We enable-srns and disable Spread Spectrum Clocking(SSC) on Orin refer to the following link:
How to disable Spread Spectrum Clocking(SSC) on Orin
The two computers are turned on at the same time. Then the endpoint orin executes the requisite start_endpoint.sh command which is:

cd /sys/kernel/config/pci_ep/
mkdir functions/pci_epf_tvnet/func1
echo 16 > functions/pci_epf_tvnet/func1/msi_interrupts
ln -s functions/pci_epf_tvnet/func1 controllers/141a0000.pcie_ep/
echo 1 > controllers/141a0000.pcie_ep/start

Then we reboot the rootport orin, we get errors on the endpoint that look like:

root@trunk-desktop:/home/trunk# [   76.629721] pci_epf_tvnet pci_epf_tvnet.0: tvnet_ep_open: PCIe link is not up
[   76.645435] pci_epf_tvnet pci_epf_tvnet.0: tvnet_ep_open: PCIe link is not up
[   76.656325] pci_epf_tvnet pci_epf_tvnet.0: tvnet_ep_open: PCIe link is not up
[   76.667567] pci_epf_tvnet pci_epf_tvnet.0: tvnet_ep_open: PCIe link is not up
[   76.680197] pci_epf_tvnet pci_epf_tvnet.0: tvnet_ep_open: PCIe link is not up
[   94.604342] CPU:0, Error: cbb-fabric@0x13a00000, irq=34
[   94.609727] **************************************
[   94.614653] CPU:0, Error:cbb-fabric, Errmon:2
[   94.619133] 	  Error Code		: TIMEOUT_ERR
[   94.623172] 
[   94.624701] 	  Error Code		: TIMEOUT_ERR
[   94.628733] 	  MASTER_ID		: CCPLEX
[   94.632230] 	  Address		: 0x3e90078
[   94.635820] 	  Cache			: 0x1 -- Bufferable 
[   94.640114] 	  Protection		: 0x2 -- Unprivileged, Non-Secure, Data Access
[   94.647095] 	  Access_Type		: Read
[   94.650591] 	  Access_ID		: 0x13
[   94.650592] 	  Fabric		: cbb-fabric
[   94.657501] 	  Slave_Id		: 0x2e
[   94.660719] 	  Burst_length		: 0x0
[   94.664216] 	  Burst_type		: 0x1
[   94.667537] 	  Beat_size		: 0x2
[   94.670754] 	  VQC			: 0x0
[   94.673533] 	  GRPSEC		: 0x7e
[   94.676588] 	  FALCONSEC		: 0x0
[   94.679807] 	**************************************
[   94.684848] WARNING: CPU: 0 PID: 227 at drivers/soc/tegra/cbb/tegra234-cbb.c:577 tegra234_cbb_isr+0x130/0x170
[   94.695263] ---[ end trace d5c8b5c14fb41327 ]---
[   94.700052] CPU:0, Error: cbb-fabric@0x13a00000, irq=34

These logs are captured from the serial UART console. We log into the machine through the same serial so that the executed commands are visible in the log files.
orin_rp_console.txt (194.0 KB)
orin_ep_console.txt (562.6 KB)

so any suggestion about this issue, thx

There was a problem that felt quite abnormal when I used the following command to refresh the bpmp partition of endport orin.

sudo ./flash.sh -k A_bpmp-fw-dtb jetson-agx-orin-devkit  mmcblk0p1

After the rootport is restarted, pcie ethernet can be identified and the network communication between them is normal,The endpoint did not report any errors.
But /sys/kernel/debug/BPMP/debug/uphy/config register values on endpoint from 0x40d84000 into 0x00584000.
our endpoint ODMDATA configured to:

ODMDATA="gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-1,hsio-uphy-config-16,gbe0-enable-10g";

our rootport ODMDATA configured to:

ODMDATA="gbe-uphy-config-22,hsstp-lane-map-3,nvhs-uphy-config-0,hsio-uphy-config-16,gbe0-enable-10g";

What’s going on here.
I think 0x40d84000 is the correct configuration on the endpoint , why does rebrushing A_bpmp-fw-dtb partition to 0x00584000 work?

Please do full flash but not just a partition.

We used sudo./flash.sh jetson-agx-orin-devkit mmcblk0p1, restart root port, endpoint error, so can you help us look at the first question

Hi,

Could you check if devmem method can work fine first? If even that one is not working, then don’t need to check tvnet function.

Also, please disable SRNS first and see if that work. We didn’t test any endpoint function with SSC disabled.

Do we use devmem to see which registers are configured correctly? Because our first version of hardware was designed without a pcie external clock, we had to use enable SRNS

Hi,

No, we are talking about the shared memory read/write in this doc. Devmem will be used to write and read. This is the simplest method to validate EP function.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/Communications/PcieEndpointMode.html?highlight=endpoint

Ok, we will do full flash first again and check the write and read of shared RAM.

  1. If you want to use SRNS, then you have to disable SSC(on both EP and RP side)
    and add pcie-cX -endpoint-use-int-refclk in bpmp dtb “uphy” section (only on EP).

  2. There is a property "nvidia,enable-srns” for kernel dtb. You have to add this in both EP and RP PCIe controller DT.

I only add "nvidia,enable-srns” in both EP and RP PCIe controller DT and delete “nvidia,enable-ext-refclk;” I also tried to add “pcie cx-endpoint-use-int-refclk” to the EP

Just a reminder, this is bpmp dtb but not kernel dtb.

Please add below DT property under “uphy” node of BPMP DTB to use internal refclk for EP side.
pcie-c5-endpoint-use-int-refclk; → C5

And I think you are using '“pcie c5-endpoint-use-int-refclk” but not really “cx”, right?

Yes, it should be pcie-c5-endpoint-use-int-refclk;
But don’t bpmp dtb and kernel dtb use the same file?

No, they are totally different.

bpmp dtb is not open source and all under your Linux_for_Tegra/bootloader/t186ref.

Kernel dtb is open source and under Linux_for_Tegra/kernel/dtb.

You need to use dtc to convert bpmp dtb and add the modification.

bpmp dtb should be tegra234-bpmp-3701-0005-3737-0000.dtb, our orin is 64G, thx

clock@plle {
			clk-id = <0x64>;
			disable-spread = <0x1>;
		};

		clock@pllnvhs {
			clk-id = <0xf3>;
			disable-spread = <0x1>;
			pcie-c5-endpoint-use-int-refclk;
		};

		clock@pllgbe {
			clk-id = <0x13f>;
			disable-spread = <0x1>;
		};

Is that right?

No, not right. You probably missed my comment above.

uphy {
    status = "okay";
    hsio-uphy-config = <0x0>;
    hsstp-lane-map = <0x3>;
    nvhs-uphy-config = <0x0>;
    gbe-uphy-config = <0x16>;
    gbe0-enable-10g;
    pcie-c5-endpoint-use-int-refclk;                                                                                                                                                               
 };

Sorry, I found the right spot.

yes, this looks correct.

I also found that the configurations of hsio-uphy-config and nvhs-uphy-config on bpmp ,
Do they need to change it modified, Or they do it through ODMDATA?

I think the fact that I only reflash the A_bpmp-fw-dtb partition may have caused the system to use these default configurations

Doing it in odmdata should be sufficient.