Xavier 10G pcie switch bandwidth is low，[ksoftirQD /0] load Is too high

zzbsky · July 14, 2022, 6:14am

Hi，
We use jetpack4.6 filesystem，Xavier PCIE_C4 connect a marvell 2lane gen3.0 pcie 10G switch, It seems to be working fine, but when iperf3 was used to test the bandwidth, it only reached 1 Gbps and [ksoftirQD /0] load rate is close to %100.

The pcie device tree is set to:
Linux_for_Tegra/source/public/hardware/nvidia/platform/t19x/galen/kernel-dts/common/tegra194-p2888-0000-a00.dtsi
pcie@14160000 {
status = “okay”;

	nvidia,pex-wake = <&tegra_main_gpio TEGRA194_MAIN_GPIO(L, 2)
				GPIO_ACTIVE_HIGH>;
    vddio-pex-ctl-supply = <&p2888_spmic_sd3>;
    nvidia,disable-aspm-states = <0xf>;
    nvidia,enable-power-down;
    nvidia,disable-clock-request;

    nvidia,max-speed = <3>;
	num-lanes = <2>;

    phys = <&p2u_8>,
           <&p2u_9>;

    phy-names = "pcie-p2u-0", "pcie-p2u-1";
};

Could you please give me some advice on how to solve this problem? Thank you!

kayccc · July 14, 2022, 6:39am

Have you tried to set the system to maximum performance to see if can improve?
See NVIDIA Jetson Linux Driver Package Software Features : Clock Frequency and Power Management | NVIDIA Docs

zzbsky · July 14, 2022, 7:01am

Yes,I perform sudo nvpmodel -m 0; sudo jetson_clocks to set the system to maximum performance;
It can slightly increase the speed measurement bandwidth, ksoftirQD /0 Occupancy is still high

linuxdev · July 14, 2022, 6:18pm

I am curious, before you test, and then again after testing for a short time, what do you see from this?

egrep '(CPU|qos|ether|^IPI)' /proc/interrupts
# Note the following shows "ksoftirqd/number", where "number" is 0-based core (1 ksoft per core):
ps -eo pid,tid,pri,class,pcpu,cmd | egrep '(ksoft|COMMAND|UID|PID|CMD)' | egrep -v grep

One of the weaknesses of Jetsons is that many hardware IRQs must be on CPU0, which could cause IRQ starvation. The “/proc/interrupts” file is strictly about hardware IRQ. I wouldn’t think ksoftirqd would have this issue since it can run on any CPU, but those IRQs of course must be handed off from the hardware IRQ. If the hardware IRQ is running too fast, then ksoftirqd would actually be starved (not what you described, but it would be interesting to see if the hardware IRQ producing the software IRQ is saturated, or if instead a non-saturated hardware IRQ produces a saturated software IRQ).

zzbsky · July 15, 2022, 3:06am

Before the iperf3 test：

During the iperf3 test：

During iperf3 testing, I saw ksoftirqd slowly increase on CPU0，When I stopped testing, it will come down slowly.

alanz · July 15, 2022, 3:21am

pls update the max-speed to 4 instead of 3, otherwise you will fallback to gen1.

  nvidia,max-speed = <4>;

zzbsky · July 15, 2022, 4:09am

I tried but nothing improved. In fact, originally I set max-speed to 4.

linuxdev · July 15, 2022, 8:43pm

Some observations regarding the IRQ info and work load…

I saw only TX and general IRQ for hardware, I did not see any RX interrupt in hardware. It would be useful to see logs after the test had been going for some time, but note that only CPU0 is used for hardware IRQ. Software IRQ (which is ksoftirqd) could in theory migrate, but it is all on CPU0. Typically the scheduler, if naive, will try to keep the software IRQ on the same core as a means of avoiding cache misses, but if the software IRQ adds too much of a load to the CPU0, and starts starving hardware IRQ servicing, then it is probably better off migrating to a new core and living with the cache miss. I’m not certain with this hardware the best way to test migrating ksoftirq/0 to another core (e.g., ksoftirqd/7), but it would be an interesting test.

I do wonder though why the ksoftirq/0 is so high. Network servicing is normally a significant part of workload, and it might just be the fact that iperf is purposely trying to load the system down as a test, but I’d think it would perform better even under those circumstances. I’d really like to see what happens if “ksoftirqd/0” becomes “ksoftirqd/7”.

alanz · July 19, 2022, 8:00am

Could you share your pcie Link status with max-speed = 4?

zzbsky · July 19, 2022, 1:05pm

sudo lspci -vvv：

0004:01:00.0 Ethernet controller: Marvell Technology Group Ltd. Device 0f13 (rev 01)
Subsystem: Marvell Technology Group Ltd. Device abcd
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 39
Region 0: Memory at 1740000000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at 1740100000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <32us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00001000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [158 v1] #19
Capabilities: [168 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us
L1SubCtl2: T_PwrOn=40us
Capabilities: [178 v1] #22
Capabilities: [184 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [284 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Kernel driver in use: oak

alanz · July 20, 2022, 7:36am

The link status is under expectation, would you mind to share your iperf3 command line?
What’s the size of MTU are u test with, could you test with 64K and share here?

zzbsky · July 20, 2022, 9:41am

When the size of MTU is set to 1500:

When the size of MTU is set to 9000:

alanz · July 21, 2022, 1:17am

For bandwitdth test UDP is recommended and the bottleneck would be from the ethernet stack.
You can try with MTU size = 64K with UDP to see the improvement, but totally PCIEgen3 with 2lane would be upto 16G theoretically. U can also monitor the IRQ traffic with MTU 64K to see difference.

For further confusion, pls update the test here.

zzbsky · July 21, 2022, 2:40am

I try with MTU size = 64K,The test result is the same as mtu size = 9K.
When the size of MTU is set to 65536:

When UDP test with MTU size = 64K,[ksoftirqd/0] is not high:

When TCP test with MTU size = 64K,The load of [ksoftirqd/0] is very high:

So I think the udp test bandwidth goes up because it doesn’t cause the load of [ksoftirqd/0] to increase.Or is there a way to solve the Ethernet Stack bottleneck for tcp?

alanz · July 28, 2022, 12:58am

It’s decided by the ethernet protocal and the stack, it’s not something that samilar with PCIE for bare data transation.

As you can get 8G+ performance, it’s close to the peak, if you are still want to improve on the perf, I would suggest to consult with vendor on jumbo frame support.

The perf is aimed to prove the capability which may not always can be reproduced on this performance in your real case.

system · August 17, 2022, 6:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PCIe 10gbps throughtput issue Jetson AGX Xavier	20	2391	April 1, 2019
The max bandwidth of of virtual ethernet over PCIe between two xaviers Jetson AGX Xavier pcie	7	1362	October 18, 2021
What is the actual maximum speed of Jetson AGX Xavier PCIE Ethernet? Jetson AGX Xavier pcie , ethernet	14	3837	May 5, 2022
10GbE PCIe Card Behaves like 1GbE Card Jetson AGX Xavier	25	3473	October 18, 2021
40Gb ConnectX-3 can only reach about 5Gbps on Jetson AGX Xavier Jetson AGX Xavier pcie , ethernet , ethernet-adapter-cards	9	1159	June 29, 2023
The bandwidth of of virtual ethernet over PCIe between two xaviers is low Jetson AGX Xavier	92	10338	October 18, 2021
What‘s the maximum speed supported by PCIE Ethernet for Jetson AGX Xavier？ Jetson AGX Xavier pcie	3	1123	May 25, 2022
PCIe 10G ethernet transfer performance far from spec Jetson AGX Xavier ethernet-adapter-cards	22	1444	August 17, 2023
[35.2.1] I210 throughput slow on Xavier NX Jetson Xavier NX networking	16	2126	August 22, 2023
PCIE x4, only 658MB/s Jetson TX2	19	9299	October 18, 2021

Xavier 10G pcie switch bandwidth is low，[ksoftirQD /0] load Is too high

Related topics