The bandwidth of of virtual ethernet over PCIe between two xaviers is low

Currently not. The DMA of virtual ethernet interface may be fixed in future release.

Why do I open CONFIG_PCIE_TEGRA_DW_DMA_TEST? Does it mean the driver in 32.1 isn’t using DMA ?
Another question: If I change the code according to your instruction in pcie-tegra.c, is “Ethernet over PCIe” still working?

Thanks.

Wayne,

I tried your instruction above. In RP xavier, I run the command “lspci -vvv -s 0005:01:00.0” and can’t find information about “Region 0” whose size is 512M.

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Dis+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- I-
        Latency: 0
        Interrupt: pin A routed to IRQ 595
        Region 2: Memory at 1c00000000 (64-bit, prefetchable) 
        Region 4: Memory at <unassigned> (64-bit, non-prefetchable)
        Capabilities: <access denied>
        Kernel driver in use: tegra_ep_mem

If I change BAR0_SIZE in pci-epf-nv-test.c back to SZ_64K, I can find the Region 0. Is there something WRONG about the SZ_512M?

Please share the result of lspci -vv.

lspci -vv result in RP side:

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Dis-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- I-
        Latency: 0
        Interrupt: pin A routed to IRQ 34
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: 30200000-302fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel driver in use: pcieport

0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13) (prog-if 01 [AHCI 1.0])
        Subsystem: Marvell Technology Group Ltd. Device 9171
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Dis+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- I-
        Latency: 0
        Interrupt: pin A routed to IRQ 563
        Region 0: I/O ports at 100010 
        Region 1: I/O ports at 100020 
        Region 2: I/O ports at 100018 
        Region 3: I/O ports at 100024 
        Region 4: I/O ports at 100000 
        Region 5: Memory at 30210000 (32-bit, non-prefetchable) 
        Expansion ROM at 30200000 [disabled] 
        Capabilities: <access denied>
        Kernel driver in use: ahci

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Dis-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- I-
        Latency: 0
        Interrupt: pin A routed to IRQ 38
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        Prefetchable memory behind bridge: 0000001c00000000-0000001c000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel driver in use: pcieport

0005:01:00.0 RAM memory: NVIDIA Corporation Device 1ad5
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Dis+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- I-
        Latency: 0
        Interrupt: pin A routed to IRQ 595
        Region 2: Memory at 1c00000000 (64-bit, prefetchable) 
        Region 4: Memory at <unassigned> (64-bit, non-prefetchable)
        Capabilities: <access denied>
        Kernel driver in use: tegra_ep_mem

The dmesg in EP side:

BAR0 RAM IOVA: 0xfc000000

Thanks.

Hi zhuce_cgf,

That (region0) is just like an example. Is there any problem using region 2 with size=128?
Please note that these patches are just for experimental test. It is not an official solution.

I haven’t tried with size 128. But in my opinion, we should have a region whose size is 512M, becasue there is code change on it.
As I told you before, if I change BAR0_SIZE back to SZ_64K, we can find region 0 whose size is 64K. So I think there must be some problem after I change BAR0_SIZE from 64K to 512M.

Thanks.

Wayne,

Sorry to bother you again. Do you have any update for my prblem?

Thanks.

For the codebase you are using, please apply the following patch as well (along with all the aforementioned patches)

diff --git a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
index 96b46cf8b1bd..d052446d110a 100644
--- a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
+++ b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
@@ -542,8 +542,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x38100000 0x0 0x38100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x38200000 0x0 0x38200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x18 0x00000000 0x18 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+			  0x82000000 0x0 0x40000000 0x1B 0x40000000 0x0 0xC0000000     /* non-prefetchable memory (3GB) */
+			  0xc2000000 0x18 0x00000000 0x18 0x00000000 0x3 0x40000000>;  /* prefetchable memory (13GB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x1c4>;
 		nvidia,cap-pl16g-status = <0x174>;
@@ -612,8 +612,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x30100000 0x0 0x30100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x30200000 0x0 0x30200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+			  0x82000000 0x0 0x40000000 0x12 0x30000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+			  0xc2000000 0x12 0x00000000 0x12 0x00000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x194>;
 		nvidia,cap-pl16g-status = <0x164>;
@@ -681,8 +681,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x32100000 0x0 0x32100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x32200000 0x0 0x32200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+			  0x82000000 0x0 0x40000000 0x12 0x70000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+			  0xc2000000 0x12 0x40000000 0x12 0x40000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x194>;
 		nvidia,cap-pl16g-status = <0x164>;
@@ -750,8 +750,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x34100000 0x0 0x34100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x34200000 0x0 0x34200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x40000000>;  /* prefetchable memory (1GB) */
+			  0x82000000 0x0 0x40000000 0x12 0xB0000000 0x0 0x10000000     /* non-prefetchable memory (256MB) */
+			  0xc2000000 0x12 0x80000000 0x12 0x80000000 0x0 0x30000000>;  /* prefetchable memory (768MB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x194>;
 		nvidia,cap-pl16g-status = <0x164>;
@@ -819,8 +819,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x36100000 0x0 0x36100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x36200000 0x0 0x36200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x14 0x00000000 0x14 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+			  0x82000000 0x0 0x40000000 0x17 0x40000000 0x0 0xC0000000      /* non-prefetchable memory (3GB) */
+			  0xc2000000 0x14 0x00000000 0x14 0x00000000 0x3 0x40000000>;  /* prefetchable memory (13GB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x1b0>;
 		nvidia,cap-pl16g-status = <0x174>;
@@ -893,8 +893,8 @@
 
 		bus-range = <0x0 0xff>;
 		ranges = <0x81000000 0x0 0x3a100000 0x0 0x3a100000 0x0 0x00100000      /* downstream I/O (1MB) */
-			  0x82000000 0x0 0x3a200000 0x0 0x3a200000 0x0 0x01E00000      /* non-prefetchable memory (30MB) */
-			  0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x4 0x00000000>;  /* prefetchable memory (16GB) */
+			  0x82000000 0x0 0x40000000 0x1f 0x40000000 0x0 0xC0000000     /* non-prefetchable memory (3GB) */
+			  0xc2000000 0x1c 0x00000000 0x1c 0x00000000 0x3 0x40000000>;  /* prefetchable memory (13GB) */
 
 		nvidia,cfg-link-cap-l1sub = <0x1c4>;
 		nvidia,cap-pl16g-status = <0x174>;

Sorry for the late response.

I changed the device tree as you guys told me. And I can find the region 0 which is 512M. The dmesg is:

[  548.061426] tegra-pcie-dw 141a0000.pcie: DMA write. Size: 536870912 bytes, Time diff: 294291527 ns
[  591.932858] tegra-pcie-dw 141a0000.pcie: DMA write. Size: 536870912 bytes, Time diff: 294288508 ns

Q1: Then the bandwidth is 13.6GB/s. But actually we have 8 lanes in 141a0000, the bandwidth should be 5*8=40GB/s, what’s the problem here?

I also tested the read process, but “echo read” only print “read” in the terminal. Then I use “cat read” and the result is

[ 1273.527652] tegra-pcie-dw 141a0000.pcie: DMA read. Size: 536870912 bytes, Time diff: 611755436 ns

Q2: The bandwidth is 6.8GB/s, which is smaller than write. What’s problem here?

Q3: The last problem is that “/sys/kernel/debug/tegra_pcie_ep/” doesn’t exist when I test the EP side.

Thanks.

WayneWWW and vidyas,

I have updated my problem, please help to answer it.
Thanks.

Can you please check what is the link speed here? If you see it as Gen-1 speed, you may have to add “nvidia,max-speed = <4>;” to the controller which is operating in endpoint mode, which in this case is “pcie_ep@141a0000” node.
Regarding read speed being less, it is expected with Tegra<->Tegra back-to-back connection. Its a design limitation and in Tegra<->Tegra case, only DMA write should be used from both sides to transfer data in both directions (instead of DMA read and write from one side)
Regarding ‘tegra_pcie_ep’ folder not being present, please check whether the client device driver is binded with endpoint device on host or not. (“sudo lspci -vv” should give this information). If you see that it is not binded, you may have to see why is it not binded (Ideally it should)

Vidyas,

How can I check what is the link speed? From device tree?

What is the client device driver you meantioned in “check whether the client device driver is binded with endpoint device on host or not”? Is it tegra_ep_mem in the “lspci -vv” below.

And the result of “lspci -vv” on RC-AGX is:

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
	Memory behind bridge: 40000000-6fffffff
	Prefetchable memory behind bridge: 0000001c00000000-0000001c000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

0005:01:00.0 RAM memory: NVIDIA Corporation Device 1ad5
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 595
	Region 0: Memory at 1f40000000 (32-bit, non-prefetchable) 
	Region 2: Memory at 1c00000000 (64-bit, prefetchable) 
	Region 4: Memory at 1f60000000 (64-bit, non-prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: tegra_ep_mem

Thanks.

Please execute ‘lspci -vvvv’ with ‘sudo’ and there will be a line starting with “LnkSta” which gives the link status (i.e. link width and speed)

>> Is it tegra_ep_mem in the “lspci -vv” below.
Yes. I see that it is already bound with the device. It is weird that there is no corresponding entry for it in the debugfs. Can you please check if tegra_ep_mem has thrown an error in the log while loading?

Vidyas,

How can I check what is the link speed? From device tree?

What is the client device driver you meantioned in “check whether the client device driver is binded with endpoint device on host or not”? Is it tegra_ep_mem in the “lspci -vv” below.

And the result of “lspci -vv” on RC-AGX is:

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
	Memory behind bridge: 40000000-6fffffff
	Prefetchable memory behind bridge: 0000001c00000000-0000001c000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

0005:01:00.0 RAM memory: NVIDIA Corporation Device 1ad5
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 595
	Region 0: Memory at 1f40000000 (32-bit, non-prefetchable) 
	Region 2: Memory at 1c00000000 (64-bit, prefetchable) 
	Region 4: Memory at 1f60000000 (64-bit, non-prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: tegra_ep_mem

Thanks.

Vidyas,

How can I check what is the link speed? From device tree?

What is the client device driver you meantioned in “check whether the client device driver is binded with endpoint device on host or not”? Is it tegra_ep_mem in the “lspci -vv” below.

And the result of “lspci -vv” on RC-AGX is:

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
	Memory behind bridge: 40000000-6fffffff
	Prefetchable memory behind bridge: 0000001c00000000-0000001c000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

0005:01:00.0 RAM memory: NVIDIA Corporation Device 1ad5
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 595
	Region 0: Memory at 1f40000000 (32-bit, non-prefetchable) 
	Region 2: Memory at 1c00000000 (64-bit, prefetchable) 
	Region 4: Memory at 1f60000000 (64-bit, non-prefetchable) 
	Capabilities: <access denied>
	Kernel driver in use: tegra_ep_mem

Thanks.

I have the same problem which is can not found tegra_ep_mem in EP-AGX,but PCIE-5 and tegra_ep_mem could be found in RC-AGX ,How can i to bind the client device driver with endpoint devic?

>> but PCIE-5 and tegra_ep_mem could be found in RC-AGX
Well, this is not an issue but the expectation. Since the client device driver runs on the host system, it is expected to have “tegra_ep_mem” debugfs entry in the same system i.e. host system.

Vidyas,

Thanks for your response!

My problem is why the EP-AGX could not found the tegra_ep_mem debugfs entry? Is it the client device driver not runs on the EP system ,or only RC-AGX runs on the client device driver is enough? if so,how to communicate between two AGX by tegra_ep_mem ?

>> My problem is why the EP-AGX could not found the tegra_ep_mem debugfs entry?
Because, “tegra_ep_mem” debugfs entry is exposed by client driver/device driver for EP-AGX which runs on the host system. So, ‘tegra_ep_mem’ is also available on the host system.

>> Is it the client device driver not runs on the EP system, or only RC-AGX runs on the client device driver is enough? if so, how to communicate between two AGX by tegra_ep_mem ?
Yes. It is a client device driver and it runs on the host as I mentioned before.
To communicate between RP-AGX and EP-AGX, we have the following options…
-> Data from RP-AGX to EP-AGX
For this, either RP-AGX’s DMA write can be used or EP-AGX’s DMA read can be used
-> Data from EP-AGX to RP-AGX
For this, either EP-AGX’s DMA write can be used or RP-AGX’s DMA read can be used