The bandwidth of of virtual ethernet over PCIe between two xaviers is low

I don’t see any Gen4 PCIe device in lspci output shared in comment #72.

For clarity: do you have two Xaviers connected together, or have you connected one Xavier in end-point mode to some other PCIe device?

(1)Sorry, my fault. Now ,end-point execute xxd as you mentioned before:
nvidia@hjGEMI:~$ xxd /proc/device-tree/pcie_ep@141a0000/nvidia,max-speed
00000000: 0000 0004

blspci -s 01:00.0 -vvv on Root-point,xavier is Gen4,width x8,it’s ok. Perhaps my dtb didn’t flash correctly.[/b]

sudo lspci -s 01:00.0 -vvv
[sudo] password for atom:
01:00.0 Network controller: NVIDIA Corporation Device 2296
Subsystem: NVIDIA Corporation Device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 20
Region 0: Memory at 80000000 (32-bit, non-prefetchable)
Region 2: Memory at 80600000 (64-bit, prefetchable)
Region 4: Memory at 80400000 (64-bit, non-prefetchable)
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit-
Address: 00000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported, Exit Latency L0s <1us, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00010000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [168 v1] #26
Capabilities: [190 v1] #27
Capabilities: [1b8 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [1c0 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
Capabilities: [1d0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2d0 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [308 v1] #25
Capabilities: [314 v1] #1f
Capabilities: [320 v1] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
Kernel driver in use: tvnet
lspci: Unable to load libkmod resources: error -12

(3)Root-point pcie device,execute lspci and get pcie 3.0,yes, our HW is PCIE 3.0

sudo lspci -s 00:0e.0 -vvv
[sudo] password for atom:
00:0e.0 PCI bridge: Intel Corporation Device 19a8 (rev 11) (prog-if 00 [Normal decode])
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 120
Region 0: Memory at 81300000 (64-bit, non-prefetchable)
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: 80000000-805fffff
Prefetchable memory behind bridge: 0000000080600000-00000000806fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag+ RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #14, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <1us, L1 <4us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootCap: CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [88] Subsystem: Intel Corporation Device 0000
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Address: feeff00c Data: 4161
Masking: 00000000 Pending: 00000000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn+ ChkCap- ChkEn+
Capabilities: [138 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [150 v1] #12
Capabilities: [180 v1] Vendor Specific Information: ID=0003 Rev=0 Len=00a <?>
Capabilities: [190 v1] #1d
Capabilities: [1d0 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
Capabilities: [200 v1] #19
Kernel driver in use: pcieport
lspci: Unable to load libkmod resources: error -12

(4)bandwidth test:

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[ 4] local 15.0.0.3 port 5001 connected with 15.0.0.2 port 58434
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 3.15 GBytes 2.70 Gbits/sec

=============================================================================End here

Thans a lot .

Hi,i have another problem.

(1)Now ,endpoint is Gen4,width=8. My root-point is Gen3,width=8. so LinkSta is Gen3,width=8.
(2)I use iperf to test bandwidth with tcp and udp, udp is slower than tcp.

tcp is about:2.70Gb/sec
udp is about:200Mb/sec

is it correct??? Anything wrong with my configuration?

I use iperf to test bandwidth with tcp and udp, udp is slower than tcp.

tcp is about:2.70Gb/sec
udp is about:200Mb/sec

is it correct??? Anything wrong with my configuration?

tcp is about:2.70Gb/sec
udp is about:200Mb/sec
is it correct??? Anything wrong with my configuration?

for UDP test we need to specify data gram length from client side. please include “-l 64000” while running iperf.

Thanks,
Om

1 Like

Thank you very much。 I have another problem. it’s so wierd.

  1. I have updated my kernel and dtb for another xavier soc,here i call it xavier-B. And on Root-point ,LinkStat and LinkCap is totally the same as the old one ,here i call it xavier-A,but bandwidth of xavier-B is only 200M.

2.Then i insert xavier-A’s sd card into xavier-B,and use iperf to test bandwidth ,wow,it’s 3Gbit again. why???what’s the difference of the filesystem on sd-card??? xavier-B’s sd card is new made according xavier’s guide.(format,unzip…)

  1. I use ps -ef to print both A and B’s processes, compare, all the same except /bin/bash/opt/nvidia/ptpd/start_ptp.sh . I canceled this script on xavier-B,but no effect. bandwidth is also 200M.

omp, can you specify which tag we should to align if we want to apply those patch files?

roseMink, did you fix your problem now?

It looks like the documentation for the 34.4.2 release has removed the 150mbps statement which leads me to believe that the patches have been integrated into the latest version. I look forward to testing this out!

Hi all,

as i just noticed the new jetpack version (4.4) is been released.
Are there update for the virtual ethernet stuff included?
So are we getting more than the 5 Gbit/s ?

Kind regards,
Nils

Yes, but not much more. I tested with iperf and maxed out at 6.41Gbps with 2 threads, 6.4Bbps with 4 threads and 3.26Gbps with 1 thread. This is between two Xavier Development kits.

This appears to be significantly bounded by the CPU interaction.

For comparison, the same test with a pair of Intel 10Gbe NICs,and maxed out at 8.5Gbps with one thread and it went downhill as I added more threads. Again, between two Xavier development kits.