PCIE(C0) - is not working for Jetson AGX orin

We have configured UPHY in configuration #2 to make the PCIe interfaces to work.
C7 and C5 is working proper but I am not able to detect I210 card on PCIe C0.

Can you please let me know if anything is wrong or missing?

I’m attaching the Configured pinmux, gpio files and debug log.

tegra234-p3737-pcie.dtsi.txt (2.1 KB)
tegra234-mb1-bct-pinmux-p3701-0000-a04 1.dtsi.txt (63.6 KB)
tegra234-mb1-bct-pinmux-p3701-0000.dtsi.txt (63.6 KB)
tegra234-mb1-bct-gpio-p3701-0000-a04 1.dtsi.txt (4.7 KB)
tegra234-mb1-bct-gpio-p3701-0000.dtsi.txt (4.8 KB)

Here is the boot log and after below log, we see PCI gets disabled

Jetson UEFI firmware (version 4.1-33958178 built on 2023-08-01T19:34:02+00:00)
ESC to enter Setup.
F11 to enter Boot Manager Menu.
Enter to continue boot.
** WARNING: Test Key is used. **
.
L4TLauncher: Attempting Direct Boot
EFI stub: Booting Linux Kernel…

OrinAGX_BootLog1.txt (96.2 KB)

please refer to

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=pcie#enable-pcie-in-a-customer-cvb-design

and check list

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html?highlight=pcie#debug-pcie-link-up-failure

Hi Wayne

We followed the debug steps for PCIe Link-up failure

1.We added nvidia,disable-power-down device tree property in PCIe controller C0 node and that worked.
We were able to see in lspci as below
lspci
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
2.Verify DLActive status in Root port LnkSta of lspci -vvv output
Here we couldn’t see DLActive status
lspci -vvv
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 64
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled]
Memory behind bridge: fff00000-000fffff [disabled]
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities:
Kernel driver in use: pcieport
3. Dump PADCTL_PEX_CTL_PEX_Lx_CLKREQ_N_0 and PADCTL_PEX_CTL_PEX_Lx_RST_N_0 pinmux values
Below are the values which we got
sudo busybox devmem 0x02437010
0x00000460
sudo busybox devmem 0x02437018
0x00000420
sudo busybox devmem 0x02437020
0x00000460
sudo busybox devmem 0x02437028
0x00000420
4. Dump PCIE_RP_APPL_DEBUG_0 registery
sudo busybox devmem 0x141800d0
0x00001818 - Here LTSSM_STATE is h03 which means POLL_COMPLIANCE
What does it mean and how can I come out of this?
5. Reduce the link speed to Gen-1 and link width to x1 using device tree properties
Can we know what is the parameter name and value and where it should be done
Is this the parameter “nvidia,max-speed = <0x04>” which we need to edit? If yes what should be the value for Gen-1

pcie@14180000 {
compatible = “nvidia,tegra234-pcie\0snps,dw-pcie”;
power-domains = <0x02 0x07>;
reg = <0x00 0x14180000 0x00 0x20000 0x00 0x38000000 0x00 0x40000 0x00 0x38040000 0x00 0x40000 0x00 0x38080000 0x00 0x40000 0x27 0x30000000 0x00 0x10000000>;
reg-names = “appl\0config\0atu_dma\0dbi\0ecam”;
status = “okay”;
#address-cells = <0x03>;
#size-cells = <0x02>;
device_type = “pci”;
num-lanes = <0x04>;
num-viewport = <0x08>;
linux,pci-domain = <0x00>;
clocks = <0x02 0xdc 0x02 0xe5>;
clock-names = “core\0core_m”;
resets = <0x02 0x79 0x02 0x74>;
reset-names = “apb\0core”;
interrupts = <0x00 0x48 0x04 0x00 0x49 0x04>;
interrupt-names = “intr\0msi”;
interconnects = <0x44 0xd8 0x44 0xd9>;
interconnect-names = “dma-mem\0dma-mem”;
iommus = <0x03 0x12>;
iommu-map = <0x00 0x03 0x12 0x1000>;
msi-parent = <0x35 0x12>;
msi-map = <0x00 0x35 0x12 0x1000>;
dma-coherent;
iommu-map-mask = <0x00>;
#interrupt-cells = <0x01>;
interrupt-map-mask = <0x00 0x00 0x00 0x00>;
interrupt-map = <0x00 0x00 0x00 0x00 0x01 0x00 0x48 0x04>;
nvidia,dvfs-tbl = <0xc28cb00 0xc28cb00 0xc28cb00 0x27ac4000 0xc28cb00 0xc28cb00 0x27ac4000 0x5f5e1000 0xc28cb00 0x27ac4000 0x5f5e1000 0x7f22ff40 0x00 0x00 0x00 0x00>;
nvidia,max-speed = <0x04>;
nvidia,disable-aspm-states = <0x0f>;
nvidia,controller-id = <0x02 0x00>;
nvidia,disable-l1-cpm;
nvidia,aux-clk-freq = <0x13>;
nvidia,preset-init = <0x05>;
nvidia,aspm-cmrt = <0x3c>;
nvidia,aspm-pwr-on-t = <0x14>;
nvidia,aspm-l0s-entrance-latency = <0x03>;
nvidia,bpmp = <0x02 0x00>;
nvidia,aspm-cmrt-us = <0x3c>;
nvidia,aspm-pwr-on-t-us = <0x14>;
nvidia,aspm-l0s-entrance-latency-us = <0x03>;
bus-range = <0x00 0xff>;
ranges = <0x81000000 0x00 0x38100000 0x00 0x38100000 0x00 0x100000 0x82000000 0x00 0x40000000 0x27 0x28000000 0x00 0x8000000 0xc3000000 0x24 0x40000000 0x24 0x40000000 0x02 0xe8000000>;
nvidia,cfg-link-cap-l1sub = <0x1b0>;
nvidia,cap-pl16g-status = <0x174>;
nvidia,cap-pl16g-cap-off = <0x188>;
nvidia,event-cntr-ctrl = <0x1c4>;
nvidia,event-cntr-data = <0x1c8>;
nvidia,dl-feature-cap = <0x2f8>;
nvidia,ptm-cap-off = <0x304>;
nvidia,disable-power-down;
vddio-pex-ctl-supply = <0x24>;
phys = <0x45>;
phy-names = “p2u-0”;
phandle = <0x372>;
};

yes, nvidia,max-speed is the one. If you set it to 1, then it will be gen1.

What does this mean? What can be the reason?

Please remove them from your tegra234-mb1-bct-gpio-p3701-* too.

TEGRA234_MAIN_GPIO(K, 0)
TEGRA234_MAIN_GPIO(K, 1)

Even after removing below lines from files “tegra234-mb1-bct-gpio-p3701-0000-a04.dtsi” and “tegra234-mb1-bct-gpio-p3701-0000.dtsi”
TEGRA234_MAIN_GPIO(K, 0)
TEGRA234_MAIN_GPIO(K, 1)
not able to detect the I210 chip

lspci
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)

lspci -vvv
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 64
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 0000f000-00000fff [disabled]
Memory behind bridge: fff00000-000fffff [disabled]
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x1, ASPM not supported
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP+, LTR+
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd+
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00010000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn+ NFERptEn+ FERptEn+
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [148 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [168 v1] Physical Layer 16.0 GT/s <?> Capabilities: [18c v1] Lane Margining at the Receiver <?>
Capabilities: [1ac v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=60us
L1SubCtl2: T_PwrOn=10us
Capabilities: [1bc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2bc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2f4 v1] Data Link Feature <?> Capabilities: [300 v1] Precision Time Measurement PTMCap: Requester:+ Responder:+ Root:+ PTMClockGranularity: 16ns PTMControl: Enabled:- RootSelected:- PTMEffectiveGranularity: Unknown Capabilities: [30c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Capabilities: [374 v1] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
Kernel driver in use: pcieport

Now I am getting LTSSM_STATE as h02 - POLL_ACTIVE
sudo busybox devmem 0x141800d0
0x00001810

looks like DLAcvitve status comes out?

DLActive seems to be ‘0’. Any other suggestions what to check?

do you have other kind of pcie NIC card that can test here?

Ya, we have tested with other NIC card but we see the same issue on C0.
The same NIC cards are working fine on C5 and C7.

please review the hardware design.

Ok Sure.

also try to bind and rebind the pcie controller in sysfs node and see if it is able to detect the NIC.

Below are the commands I have used
cd /sys/bus/platform/drivers/tegra194-pcie
echo 14180000.pcie > bind
echo: write error: No such device
echo 14180000.pcie > unbind
echo: write error: No such device

Also, I tried below command, but it gives error.
echo 0000:00:00.0 > /sys/bus/pci/devices/0000:00:00.0/driver/bind
echo: write error: No such device

Please suggest if there is change in the command

Hi Wayne,

Any suggestion on this.