Jetson AGX Orin PCIe C7 "Phy link never came up"

Hi NV_Team,
Based on our customized carrier board, we enabled PCIe C7 according to document.
We also updated ODMDATA to ODMDATA =“gbe-uphy-config-0,hsstp-lane-map-3,hsio-uphy-config-0,nvhs-uphy-config-0”
A PCIe nvme disk connected to PCIe C7, but system startup log outputs :

[    6.102743] tegra194-pcie 141e0000.pcie: Adding to iommu group 12
[    6.114901] tegra194-pcie 141e0000.pcie: Using GICv2m MSI allocator
[    9.875746] tegra194-pcie 141e0000.pcie: Using GICv2m MSI allocator
[    9.884025] tegra194-pcie 141e0000.pcie: host bridge /pcie@141e0000 ranges:
[    9.891199] tegra194-pcie 141e0000.pcie:       IO 0x003e100000..0x003e1fffff -> 0x003e100000
[    9.899886] tegra194-pcie 141e0000.pcie:      MEM 0x3228000000..0x322fffffff -> 0x0040000000
[    9.908561] tegra194-pcie 141e0000.pcie:      MEM 0x2e40000000..0x3227ffffff -> 0x2e40000000
[   11.025079] tegra194-pcie 141e0000.pcie: **Phy link never came up**
[   11.031246] tegra194-pcie 141e0000.pcie: PCI host bridge to bus 0007:00

Why nvme disk couldn’t be found?

root@tegra-ubuntu:~# lspci -vvv -s 0007:00:00.0
0007:00:00.0 PCI bridge: NVIDIA Corporation Device 229a (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 70
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        I/O behind bridge: 0000f000-00000fff [disabled]
        Memory behind bridge: fff00000-000fffff [disabled]
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                RootCap: CRSVisible+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP+, LTR+
                         10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd+
                         AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 1ms to 10ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
                Vector table: BAR=2 offset=00000000
                PBA: BAR=2 offset=00010000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
                RootCmd: CERptEn+ NFERptEn+ FERptEn+
                RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                         FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
                ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
        Capabilities: [148 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [168 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [190 v1] Lane Margining at the Receiver <?>
        Capabilities: [1c0 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
                          PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=60us
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [1d0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
        Capabilities: [2d0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [308 v1] Data Link Feature <?>
        Capabilities: [314 v1] Precision Time Measurement
                PTMCap: Requester:+ Responder:+ Root:+
                PTMClockGranularity: 16ns
                PTMControl: Enabled:- RootSelected:-
                PTMEffectiveGranularity: Unknown
        Capabilities: [320 v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
        Capabilities: [388 v1] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
        Kernel driver in use: pcieport

dmesg.txt and kernel_tegra234-p3701-0004-p3737-0000.dts uploaded.
dmesg.txt (59.4 KB)
kernel_tegra234-p3701-0004-p3737-0000.dts.txt (528.5 KB)

please refer to debug tips

https://docs.nvidia.com/jetson/archives/r35.5.0/DeveloperGuide/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=141a#debug-pcie-link-up-failure

Also, you didn’t share the pinmux setting.

https://docs.nvidia.com/jetson/archives/r35.5.0/DeveloperGuide/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=141a#enable-pcie-in-a-customer-cvb-design

We have refered to above link and the part of " Example change: PCIe x1 (C0) and PCIe x8 (C7) in Root Port mode" to enable PCIe C7,.

To triage the issue:
Triaging from platform side:
for PERST#: Does it mean a reset signal ? we have checked with oscilloscope and observed a pull-down pull-up process on Linux kernel booting.
for REFCLK: we haved checked with oscilloscope and observed 100M clk ouput;
for CLKREQ#: We don’t enable ASPM as previous outputs from command lspci;
for Tx and Rx routing: We verified lanes routing are fine.
for PCIe slot regulators or GPIOs: Does PCIe C7 need any regulators?

Triaging from Software side:
1)DLActive and LnkSta of lspci -vvv outputs pasted as above;
2)How to dump PADCTL_PEX_CTL_PEX_L_CLKREQ_N_0 and PADCTL_PEX_CTL_PEX_L_RST_N_0 pinmux values and check if settings are correct ? **
3)How to dump PCIE_RP_APPL_DEBUG_0 register values and check if value is correct ?
4) The link speed and Width are downgraded:
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Which pinmux setting file I need to share, we use R35.3.1 ?

The document has the file name…

Also, if you are still asking this question, it means you didn’t change it before… and it probabaly means the root cause…

I modified dts according to following code from document:

diff --git
a/hardware/nvidia/platform/t23x/concord/kernel-dts/cvb/tegra234-p3737-pcie.dtsi
b/hardware/nvidia/platform/t23x/concord/kernel-dts/cvb/tegra234-p3737-pcie.dtsi
index bc065d35f..8f9f9b617 100644
---
a/hardware/nvidia/platform/t23x/concord/kernel-dts/cvb/tegra234-p3737-pcie.dtsi
+++
b/hardware/nvidia/platform/t23x/concord/kernel-dts/cvb/tegra234-p3737-pcie.dtsi

+    pcie@14180000 {
+    status = "okay";
+    phys = <&p2u_hsio_0>;
+    phy-names = "p2u-0";
+    };
+
+    pcie@141e0000 {
+          status = "okay";
+          num-lanes = <8>;
+          phys = <&p2u_gbe_0>, <&p2u_gbe_1>, <&p2u_gbe_2>,
+                    <&p2u_gbe_3>, <&p2u_gbe_4>, <&p2u_gbe_5>,
+                    <&p2u_gbe_6>, <&p2u_gbe_7>;
+          phy-names = "p2u-0", "p2u-1", "p2u-2", "p2u-3",
+                      "p2u-4", "p2u-5", "p2u-6", "p2u-7";
+    };
Below is the change to configure clkreq and reset pins in mb1 bct.
--- a//tegra234-mb1-bct-pinmux-p3701-0000-a04.dtsi
+++ b//tegra234-mb1-bct-pinmux-p3701-0000-a04.dtsi
@@ -1486,9 +1486,9 @@

                     pex_l0_clkreq_n_pk0 {
                             nvidia,pins = "pex_l0_clkreq_n_pk0";
 -                           nvidia,function = "rsvd1";
 +                           nvidia,function = "pe0";
                             nvidia,pull = <TEGRA_PIN_PULL_NONE>;
 -                           nvidia,tristate = <TEGRA_PIN_ENABLE>;
 +                           nvidia,tristate = <TEGRA_PIN_DISABLE>;
                             nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                             nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                             nvidia,lpdr = <TEGRA_PIN_DISABLE>;
 @@ -1496,10 +1496,10 @@

                     pex_l0_rst_n_pk1 {
                             nvidia,pins = "pex_l0_rst_n_pk1";
 -                           nvidia,function = "rsvd1";
 +                           nvidia,function = "pe0";
                             nvidia,pull = <TEGRA_PIN_PULL_NONE>;
 -                           nvidia,tristate = <TEGRA_PIN_ENABLE>;
 -                           nvidia,enable-input = <TEGRA_PIN_ENABLE>;
 +                           nvidia,tristate = <TEGRA_PIN_DISABLE>;
 +                           nvidia,enable-input = <TEGRA_PIN_DISABLE>;
                             nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                             nvidia,lpdr = <TEGRA_PIN_DISABLE>;
                     };

@@ -1566,9 +1566,9 @@

                   pex_l7_clkreq_n_pag0 {
                          nvidia,pins = "pex_l7_clkreq_n_pag0";
-                         nvidia,function = "rsvd1";
+                         nvidia,function = "pe7";
                          nvidia,pull = <TEGRA_PIN_PULL_NONE>;
-                         nvidia,tristate = <TEGRA_PIN_ENABLE>;
+                         nvidia,tristate = <TEGRA_PIN_DISABLE>;
                          nvidia,enable-input = <TEGRA_PIN_ENABLE>;
                          nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                          nvidia,lpdr = <TEGRA_PIN_DISABLE>;
@@ -1576,10 +1576,10 @@

               pex_l7_rst_n_pag1 {
                  nvidia,pins = "pex_l7_rst_n_pag1";
-                 nvidia,function = "rsvd1";
+                 nvidia,function = "pe7";
                  nvidia,pull = <TEGRA_PIN_PULL_NONE>;
-                 nvidia,tristate = <TEGRA_PIN_ENABLE>;
-                 nvidia,enable-input = <TEGRA_PIN_ENABLE>;
+                 nvidia,tristate = <TEGRA_PIN_DISABLE>;
+                 nvidia,enable-input = <TEGRA_PIN_DISABLE>;
                  nvidia,io-high-voltage = <TEGRA_PIN_ENABLE>;
                  nvidia,lpdr = <TEGRA_PIN_DISABLE>;
             };
diff --git a/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi b/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi
index 86f4bc2..32fdd2e 100644
--- a/bootloader/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi
+++ b/bootloader/tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi
@@ -60,8 +60,6 @@
                        TEGRA234_MAIN_GPIO(K, 7)
                        TEGRA234_MAIN_GPIO(L, 2)
                        TEGRA234_MAIN_GPIO(L, 3)
-                       TEGRA234_MAIN_GPIO(AG, 0)
-                       TEGRA234_MAIN_GPIO(AG, 1)
                     TEGRA234_MAIN_GPIO(AG, 2)
                     TEGRA234_MAIN_GPIO(AG, 3)
                     TEGRA234_MAIN_GPIO(AG, 6)
--
2.17.1

But even lspci could found PCIe C7 bus, the nvme disk couldn’t be found. So I want to make sure again the file is what we really need to modify, and the code we added really takes take effect.
Or any aspects we should check on hardware?

This is our first project on Jetson serials.

dts uploaded.
dts.zip (14.6 KB)

You can check your flash log and see if the file you added really taking effect.

It does not help to check your file 100 times if they didn’t get flashed into the board even once.

From what key words in flash log I can make sure the code is really takes effect ?

And actually, I made some dts syntax errors and reflashed the bootloader, flashing process failed;
Then I corrected the dts syntax errors and reflash the bootloader, flashing process successed.
so i guess the code was taken effect.

tegra234-mb1-bct-pinmux-p3701-0000-a04 shall show up. The syntax trick you just said shall work for checking this file.

kernel dtb shall show up. Check /proc/device-tree and see if anything you added is present there.

I checked the outputs and confirmed modification exits in /proc/device-tree.

kernel_tegra234-p3701-0004-p3737-0000.dts.txt
was generated with dtc from /boot/dtb/kernel_tegra234-p3701-0004-p3737-0000.dtb

the dts file shows C7 was enabled:

	pcie@141e0000 {
		compatible = "nvidia,tegra234-pcie\0snps,dw-pcie";
		power-domains = <0x02 0x10>;
		reg = <0x00 0x141e0000 0x00 0x20000 0x00 0x3e000000 0x00 0x40000 0x00 0x3e040000 0x00 0x40000 0x00 0x3e080000 0x00 0x40000 0x32 0x30000000 0x00 0x10000000>;
		reg-names = "appl\0config\0atu_dma\0dbi\0ecam";
		status = "okay";
		#address-cells = <0x03>;
		#size-cells = <0x02>;
		device_type = "pci";
		num-lanes = <0x08>;
		num-viewport = <0x08>;
		linux,pci-domain = <0x07>;
		clocks = <0x02 0xab 0x02 0xf4>;
		clock-names = "core\0core_m";
		resets = <0x02 0x0f 0x02 0x0e>;
		reset-names = "apb\0core";
		interrupts = <0x00 0x162 0x04 0x00 0x163 0x04>;
		interrupt-names = "intr\0msi";
		interconnects = <0x3e 0x2a 0x3e 0x30>;
		interconnect-names = "dma-mem\0dma-mem";
		iommus = <0x1c 0x08>;
		iommu-map = <0x00 0x1c 0x08 0x1000>;
		msi-parent = <0x2e 0x08>;
		msi-map = <0x00 0x2e 0x08 0x1000>;
		dma-coherent;
		iommu-map-mask = <0x00>;
		#interrupt-cells = <0x01>;
		interrupt-map-mask = <0x00 0x00 0x00 0x00>;
		interrupt-map = <0x00 0x00 0x00 0x00 0x01 0x00 0x162 0x04>;
		nvidia,dvfs-tbl = <0xc28cb00 0xc28cb00 0xc28cb00 0xc28cb00 0xc28cb00 0xc28cb00 0xc28cb00 0x27ac4000 0xc28cb00 0xc28cb00 0x27ac4000 0x5f5e1000 0xc28cb00 0x27ac4000 0x5f5e1000 0x7f22ff40>;
		nvidia,max-speed = <0x04>;
		nvidia,disable-aspm-states = <0x0f>;
		nvidia,controller-id = <0x02 0x07>;
		nvidia,tsa-config = <0x200b004>;
		nvidia,disable-l1-cpm;
		nvidia,aux-clk-freq = <0x13>;
		nvidia,preset-init = <0x05>;
		nvidia,aspm-cmrt = <0x3c>;
		nvidia,aspm-pwr-on-t = <0x14>;
		nvidia,aspm-l0s-entrance-latency = <0x03>;
		nvidia,bpmp = <0x02 0x07>;
		nvidia,aspm-cmrt-us = <0x3c>;
		nvidia,aspm-pwr-on-t-us = <0x14>;
		nvidia,aspm-l0s-entrance-latency-us = <0x03>;
		bus-range = <0x00 0xff>;
		ranges = <0x81000000 0x00 0x3e100000 0x00 0x3e100000 0x00 0x100000 0x82000000 0x00 0x40000000 0x32 0x28000000 0x00 0x8000000 0xc3000000 0x2e 0x40000000 0x2e 0x40000000 0x03 0xe8000000>;
		nvidia,cfg-link-cap-l1sub = <0x1c4>;
		nvidia,cap-pl16g-status = <0x174>;
		nvidia,cap-pl16g-cap-off = <0x188>;
		nvidia,event-cntr-ctrl = <0x1d8>;
		nvidia,event-cntr-data = <0x1dc>;
		nvidia,dl-feature-cap = <0x30c>;
		nvidia,ptm-cap-off = <0x318>;
		nvidia,disable-power-down;
		vddio-pex-ctl-supply = <0x30>;
		phys = <0x45 0x46 0x47 0x48 0x49 0x4a 0x4b 0x4c>;
		phy-names = "p2u-0\0p2u-1\0p2u-2\0p2u-3\0p2u-4\0p2u-5\0p2u-6\0p2u-7";
		phandle = <0x35f>;
	};

what should i do next to find out the root cause?
Check the values of **PADCTL_PEX_CTL_PEX_L*_CLKREQ_N_0 and PADCTL_PEX_CTL_PEX_L* _RST_N_0 pinmux values ?
Or dump PCIE_RP_APPL_DEBUG_0 register values ?
But How to do this ?
I don’t know the definition of those pin, the address of PCIE_RP_APPL_DEBUG_0, and the desired value that we need.

Could you share the full dmesg for current situation?

Could you measure the hardware signal of PERST# and see if this meets your expectation?

Sure, the dmesg (https://forums.developer.nvidia.com/uploads/short-url/jumI3b1z0WHD3S1SmKPa75ssB1c.txt) was uploaded already.

Could you please read the previous reply again? Jetson AGX Orin PCIe C7 "Phy link never came up" - #3 by elertzhang

Does PERST# mean a reset signal ? we have checked reset signal with oscilloscope and observed a pull-down pull-up process on Linux kernel booting.

You have done some flash, check and update through these comments. Are you sure the dmesg posted yesterday is still worthy checking?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.