Jetson TX2 PCIe not detecting endpoint

Hi, I am trying to connect Intel Xeon D processor with JetsonTX2 PCIe but unable to detect PCIe device. I have tested Xeon D PCIe with other system on which it is working properly in NTB to RP mode. Now i connected the processor with JetsonTX2 on dev kit. I am unable to detect the endpoint.

Here is the dmesg log and messages keep coming, they wont stop.

[ 94.093629] pcieport 0000:00:01.0: can’t find device of ID0020
[ 94.093631] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0020
[ 94.144648] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[ 94.155081] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00000001/00002000
[ 94.155092] pcieport 0000:00:01.0: [ 0] Receiver Error

I updated “/boot/extlinux/extlinux.conf” file with “pcie_aspm=off pci=noaer” but still I can see the error in the dmesg.
I also added ntb and ntb_hw_intel in file “/etc/modules-load.d/mymodules.conf” so they can be loaded before the PCIe scan occurs, but still no PCIe.

Here is the lspci and lspci -tvnn output.

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1)

-[0000:00]-±01.0-[01]–

Setup Details:
Jetson TX2 Dev Kit with Jetpack 4.4 R32.4.3

What is the clocking configuration in this setup? Does the PCIe switch’s NT port take the clock from the TX2?
TX2 doesn’t support working with endpoints (for that matter any downstream devices) that work on their own REFCK and not taking the REFCLK from the TX2. Wondering if that is what is happening here.

I just confirmed that JetsonTX2 clock is not routed to the endpoint. Both JetsonTX2 is running on its own clock and Intel Xeon D1500 is running on its own clock, only data signals are routed between two of them.
Progress Update:
I disabled the SSC clock and verified it by reading registers. However after re-flashing with updated dtb, i am able to see my endpoint.

lspci

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1)
01:00.0 Bridge: Intel Corporation Device 6f0f

Now the issue is that, when i try to read entire pci space by using “setpci” command for device 01:00.0, the ubuntu stuck and i need to force the restart the board.

What about disabling the SSC on Intel device? SSC on Intel device should also be disabled in case if not done already.
With your current setup (assuming SSC on Intel device is still enabled), it may so happen that the link would have come up momentarily and then gone off. That could be the reason why setpci might be hanging. To confirm this, you can just do 'lspci -xxxx 'before running ‘setpci’ and if the link is down, we wouldn’t see the registers getting dumped from the Intel device.

Yes. I rechecked and SSC is disabled on both ntel and Jetson side. I run the “lspci” command that you suggested multiple times and it returns data every-time.

sudo lspci -xxxx -s 01:00.0

01:00.0 Bridge: Intel Corporation Device 6f0f
00: 86 80 0f 6f 06 04 10 20 00 00 80 06 08 00 00 00
10: 0c 00 50 48 00 00 00 00 0c 00 00 48 00 00 00 00
20: 0c 00 40 48 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 60 00 00 00 00 00 00 00 7d 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 05 80 82 01 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 11 90 03 80 00 40 00 00 00 50 00 00 00 00 00 00
90: 10 e0 02 00 21 8c 00 00 00 01 00 00 83 3c 01 00
a0: 42 00 01 10 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 1e 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 01 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
/-------------------------------------------------------
fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Further testing:
When i try to run “ntb_netdev” it creates “eth1” ethernet interface on jetson side and on intel side aswell.
Then I manually assigned IP to that interfaces and set the link status UP. The “ip a” command shows link up for new ethernet interfaces but when i try to ping other end from jetson, it outputs couple of ping success with 64 bytes and time=0.188 ms. After few success the ubuntu get stuck again, and restart automatically.
when i perform rescan or do ping. I found AER errors in the dmesg.

I don’t see any issue with the core PCIe infrastructure as such as we are able to dump the config space here without any issues.
Issue surfaces only after the Ethernet interface is started. So, mostly likely the SW stack there is causing some issues.
Regarding the AER errors observed post reboot, how good is the PCIe interface as such between the Jetson and the PCIe switch? Do these errors disappear if the switch is restarted/rebooted?

Can you please explain more about SW stack you mentioned. Also what is SMMU, if you think that it is SMMU related as mentioned in many other posts, how can i disable it?
Yes PCIe interface between Jetson and Intel Xeon is industry standard interface. I can assure you the hardware is a quality level work as it is a sensitive task.

I put ‘pci=noaer’ in the kernel boot arguments but still I can see those AER errors. I am bit curious why it is happening, it looks like that ‘pci=noaer’ command is not working. Is it possible that ‘pcie_aspm=off’ is also not working and causing the PCIe interface the problem?
When I load the “ntb_netdev” driver it shows the eth interface which can be seen in below log. But jetson ubuntu keep notifying “Ethernet Connected”/“Ethernet Disconnected”.

For debugging here is the dmesg log.

[ 21.582548] tegra-pcie 10003000.pcie-controller: PCIE: Enable power rails
[ 21.582931] tegra-pcie 10003000.pcie-controller: probing port 0, using 4 lanes
[ 21.585081] tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes
[ 32.116145] tegradc 15210000.nvdisplay: blank - powerdown
[ 32.182281] extcon-disp-state external-connection:disp-state: cable 47 state 0
[ 32.182285] Extcon AUX1(HDMI) disable
[ 32.200194] tegra_nvdisp_handle_pd_disable: Powergated Head2 pd
[ 32.200281] tegra_nvdisp_handle_pd_disable: Powergated Head1 pd
[ 32.202836] tegra_nvdisp_handle_pd_disable: Powergated Head0 pd
[ 32.247169] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 32.326399] tegradc 15210000.nvdisplay: blank - powerdown
[ 32.326418] tegradc 15210000.nvdisplay: unblank
[ 32.329124] tegra_nvdisp_handle_pd_enable: Unpowergated Head0 pd
[ 32.329306] tegra_nvdisp_handle_pd_enable: Unpowergated Head1 pd
[ 32.329442] tegra_nvdisp_handle_pd_enable: Unpowergated Head2 pd
[ 32.338990] Parent Clock set for DC plld2
[ 32.342274] tegradc 15210000.nvdisplay: hdmi: tmds rate:65000K prod-setting:prod_c_hdmi_54m_111m
[ 32.343501] tegradc 15210000.nvdisplay: hdmi: get RGB quant from EDID.
[ 32.343507] tegradc 15210000.nvdisplay: hdmi: get YCC quant from EDID.
[ 32.381784] extcon-disp-state external-connection:disp-state: cable 47 state 1
[ 32.381786] Extcon AUX1(HDMI) enable
[ 32.381846] tegradc 15210000.nvdisplay: unblank
[ 32.657739] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 33.059768] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 33.061730] tegra-pcie 10003000.pcie-controller: B3: link 2 status updated. SZP
[ 33.166733] tegra-pcie 10003000.pcie-controller: PCI host bridge to bus 0000:00
[ 33.166740] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 33.166744] pci_bus 0000:00: root bus resource [mem 0x40100000-0x47ffffff]
[ 33.166747] pci_bus 0000:00: root bus resource [mem 0x48000000-0x7fffffff pref]
[ 33.166750] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 33.166775] pci 0000:00:01.0: [10de:10e5] type 01 class 0x060400
[ 33.166871] pci 0000:00:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 33.167057] iommu: Adding device 0000:00:01.0 to group 55
[ 33.167065] arm-smmu: forcing sodev map for 0000:00:01.0
[ 33.167161] pci 0000:00:03.0: [10de:10e6] type 01 class 0x060400
[ 33.167250] pci 0000:00:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 33.167400] iommu: Adding device 0000:00:03.0 to group 56
[ 33.167404] arm-smmu: forcing sodev map for 0000:00:03.0
[ 33.167468] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 33.167479] pci 0000:00:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 33.167617] pci 0000:01:00.0: [8086:6f0f] type 00 class 0x068000
[ 33.167669] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit pref]
[ 33.167702] pci 0000:01:00.0: reg 0x18: [mem 0x4000000000-0x40003fffff 64bit pref]
[ 33.167735] pci 0000:01:00.0: reg 0x20: [mem 0x8000000000-0x80000fffff 64bit pref]
[ 33.168034] iommu: Adding device 0000:01:00.0 to group 57
[ 33.168037] arm-smmu: forcing sodev map for 0000:01:00.0
[ 33.177749] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 33.177904] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
[ 33.177955] pci 0000:00:01.0: BAR 15: assigned [mem 0x48000000-0x485fffff 64bit pref]
[ 33.177960] pci 0000:01:00.0: BAR 2: assigned [mem 0x48000000-0x483fffff 64bit pref]
[ 33.177988] pci 0000:01:00.0: BAR 4: assigned [mem 0x48400000-0x484fffff 64bit pref]
[ 33.178014] pci 0000:01:00.0: BAR 0: assigned [mem 0x48500000-0x48507fff 64bit pref]
[ 33.178041] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 33.178050] pci 0000:00:01.0: bridge window [mem 0x48000000-0x485fffff 64bit pref]
[ 33.178058] pci 0000:00:03.0: PCI bridge to [bus 02]
[ 33.178491] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[ 33.178495] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[ 33.178501] pcie_pme 0000:00:01.0:pcie001: service driver pcie_pme loaded
[ 33.178595] aer 0000:00:01.0:pcie002: service driver aer loaded
[ 33.178859] pcieport 0000:00:03.0: Signaling PME through PCIe PME interrupt
[ 33.178865] pcie_pme 0000:00:03.0:pcie001: service driver pcie_pme loaded
[ 33.178944] aer 0000:00:03.0:pcie002: service driver aer loaded
[ 33.179401] ntb_hw_intel 0000:01:00.0: NTB Secondary config disabled: B3_edited
[ 33.187348] ntb_hw_intel 0000:01:00.0: NTB device registered.
[ 34.930268] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 34.930278] Bluetooth: BNEP socket layer initialized
[ 35.276877] tegradc 15210000.nvdisplay: unblank
[ 73.176202] Software Queue-Pair Transport over NTB, version 4
[ 73.182395] ntb_hw_intel 0000:01:00.0: NTB Transport QP 0 created
[ 73.184339] ntb_hw_intel 0000:01:00.0: eth1 created
[ 73.216611] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 73.219018] ntb_hw_intel 0000:01:00.0: qp 0: Link Up

by ‘SW stack’, I was referring to the NTB stuff and the Ethernet stack on top of that.
SMMU is System MMU which in x86 world is typically called IOMMU. It sits between devices and system memory and does the job of MMU that sits between CPU and the system memory.
It doesn’t look like the issue here is caused by the SMMU as I don’t see any SMMU related prints here.

It is weird that ‘pci=noear’ is not working. Could you please confirm from the dmesg that ‘pci=noear’ is indeed present there as part of kernel command line parameters?
BTW, ‘pci=noaer’ would only suppress the errors but it won’t make the issue go away.
Regarding, disabling ASPM, it can be disabled through config options as well ( set CONFIG_PCIEASPM_PERFORMANCE to disable all ASPM states)
Could you please share the output of ‘sudo lspci -vv’ to see if ASPM states are enabled in the first place.

here is the output of “dmesg | grep aer”. It appears its not running the kernel commands.

[ 33.186799] aer 0000:00:01.0:pcie002: service driver aer loaded
[ 33.187160] aer 0000:00:03.0:pcie002: service driver aer loaded

Yes i updated the kernel with above configurations, I will update you in a short.

Here is the output of “sudo lspci -vv”.

00:01.0 PCI bridge: NVIDIA Corporation Device 10e5 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 381
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Prefetchable memory behind bridge: 0000000048000000-00000000485fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: pcieport

00:03.0 PCI bridge: NVIDIA Corporation Device 10e6 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 381
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 0e, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: pcieport

01:00.0 Bridge: Intel Corporation Device 6f0f
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 381
	Region 0: Memory at 48500000 (64-bit, prefetchable) [size=32K]
	Region 2: Memory at 48000000 (64-bit, prefetchable) [size=4M]
	Region 4: Memory at 48400000 (64-bit, prefetchable) [size=1M]
	Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [80] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=0 offset=00004000
		PBA: BAR=0 offset=00005000
	Capabilities: [90] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [e0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: ntb_hw_intel
	Kernel modules: ntb_hw_intel

I see that ASPM L1 is enabled. Could you please disable it through the kernel configs and try once?

Hi, sorry for late replay. So I updated the kernel with this “CONFIG_PCIEASPM_PERFORMANCE” flag. Now the “lspci -vvv” shows that ASPM is disabled.

01:00.0 Bridge: Intel Corporation Device 6f0f
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 381
        Region 0: Memory at 48500000 (64-bit, prefetchable) [size=32K]
        Region 2: Memory at 48000000 (64-bit, prefetchable) [size=4M]
        Region 4: Memory at 48400000 (64-bit, prefetchable) [size=1M]
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [80] MSI-X: Enable+ Count=4 Masked-
                Vector table: BAR=0 offset=00004000
                PBA: BAR=0 offset=00005000
        Capabilities: [90] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [e0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: ntb_hw_intel
        Kernel modules: ntb_hw_intel

Again the testing has been performed with “ntb_netdev” and I am able to perform “ping” to the other side. But after couple of pings the popup shows that “ethernet is disconnected”. I again write “ifconfig eth1 up” and able to ping again. After doing this 2 times and ethernet get stable and it does not drop its connectivity. This shows that disabling ASPM improve lot of things, before this Ubuntu gets crash every time but now it is stable.

After ping test I performed iperf3 test in-order to measure the throughput. iperf3 tool shows of approximately 70Mbits/s. Which is disappointment. Then I run the command “sudo jetson_clocks”. After this command there is a significant improvement in the throughput, but it looks like this PCIe is Gen1 and x4??? please clear me regarding this. Also I want to know if there are any updated NTB drivers (PCIe gen3 or gen2 supported) available from NVIDIA for Jetson.

$ iperf3 -c 192.168.11.11
Connecting to host 192.168.11.11, port 5201
[  4] local 192.168.11.10 port 40750 connected to 192.168.11.11 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   177 MBytes  1.48 Gbits/sec    0   1.75 MBytes       
[  4]   1.00-2.00   sec   181 MBytes  1.52 Gbits/sec    0   1.75 MBytes       
[  4]   2.00-3.00   sec   181 MBytes  1.51 Gbits/sec    0   1.75 MBytes       
[  4]   3.00-4.00   sec   173 MBytes  1.45 Gbits/sec    0   1.75 MBytes       
[  4]   4.00-5.00   sec   180 MBytes  1.50 Gbits/sec    0   1.75 MBytes       
[  4]   5.00-6.00   sec   177 MBytes  1.49 Gbits/sec    0   1.75 MBytes       
[  4]   6.00-7.00   sec   180 MBytes  1.51 Gbits/sec    0   1.75 MBytes       
[  4]   7.00-8.00   sec   187 MBytes  1.57 Gbits/sec    0   1.75 MBytes       
[  4]   8.00-9.00   sec   177 MBytes  1.48 Gbits/sec    0   1.75 MBytes       
[  4]   9.00-10.00  sec   182 MBytes  1.53 Gbits/sec    0   1.75 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.75 GBytes  1.51 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  1.75 GBytes  1.51 Gbits/sec                  receiver

iperf Done.

Now I am concern about the latency issue which is occurring in the ping and Ethernet disconnected issue (which i discussed earlier). As you can see below.

$ ping 192.168.11.11
PING 192.168.11.11 (192.168.11.11) 56(84) bytes of data.
64 bytes from 192.168.11.11: icmp_seq=1 ttl=64 time=0.189 ms
64 bytes from 192.168.11.11: icmp_seq=2 ttl=64 time=0.191 ms
64 bytes from 192.168.11.11: icmp_seq=3 ttl=64 time=6.82 ms
64 bytes from 192.168.11.11: icmp_seq=4 ttl=64 time=0.200 ms
64 bytes from 192.168.11.11: icmp_seq=5 ttl=64 time=0.202 ms
64 bytes from 192.168.11.11: icmp_seq=6 ttl=64 time=0.199 ms

Jetson TX2 support Gen-2 and x4
Could you please share the output of ‘sudo lspci -vvvv’ and ‘sudo lspci -t’ ? I wanted to understand the capabilities on the other side also.

Hi, sorry for late response. Below is the result of “sudo lspci -vvv”. This is the XEON side PCIe.


00:03.0 Bridge: Intel Corporation Device 6f0e (rev 03)
	Subsystem: Intel Corporation Device 0000
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 29
	NUMA node: 0
	Region 0: Memory at f8a00000 (64-bit, prefetchable) [size=64K]
	Region 2: Memory at f8000000 (64-bit, prefetchable) [size=4M]
	Region 4: Memory at f8900000 (64-bit, prefetchable) [size=1M]
	Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
		Address: 00000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [80] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [90] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
			RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ NonFatalErr+ FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x0, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed unknown (ok), Width x0 (ok)
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [e0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
	Capabilities: [110 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c <?>
	Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?>
	Capabilities: [250 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 3
	Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?>
	Capabilities: [300 v1] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?>
	Kernel driver in use: ntb_hw_intel
	Kernel modules: ntb_hw_intel

Hi,
I want the ‘sudo lspci -vvvv’ for the full hierarchy and the hierarchy itself using ‘sudo lspci -t’.

Hi,
I collected data for both commands as you requested. Kindly find the files in the link below.
https://drive.google.com/drive/folders/16CfoCmc89O2yFfpKB2bNXHUJKdb_OJ1Q?usp=sharing

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Can you share the same from Jetson side as well? Also, could you indicate which EP’s are used on both sides that form the NT bridge?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.