PCIe 10G ethernet transfer performance far from spec

hermes_wu · August 14, 2023, 8:04am

Hi,

We conduct 10G ethernet test on Nvidia EVB and our own design platform with NV SOM as attachment.
According to the test results, the transfer speed is far below from 10G spec, at around 5 or 6G only.
Our 10G IC solution partner suggests us to check with NV, whether NV has any setting/configuration related to “IRQ balance” could be finetuned to achieve better result (at least 8 or 9G preferred)

Thanks.

SDK Version : Jetpack 5.1
PCIe 10G Ethernet controller on NV EVB : Marvell AQC113 & Marvell AQC107 & Orin MGBE
PCIe 10G Ethernet controller on our design board : Marvell AQC113

Test on NV EVB(only 1 process, 1 thread):

Test on our own design board:

Test Command:

1 process, 1 thread:
  server - 
    iperf3 -s -i 3 -p 5200

  client - 
    iperf3 -c <Server_IP> -i 3 -p 5200    # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -R    # Rx Test

1 process, 4 threads:
  server - 
    iperf3 -s -i 3 -p 5200

  client - 
    iperf3 -c <Server_IP> -i 3 -p 5200 -P 4   # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -P 4 -R    # Rx Test

2 processes, 2 threads:
  server - 
    iperf3 -s -i 3 -p 5200    # Process 1
    iperf3 -s -i 3 -p 5201    # Process 2

  client - 
    # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200    # Process 1
    iperf3 -c <Server_IP> -i 3 -p 5201    # Process 2

    # Rx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -R    # Process 1
    iperf3 -c <Server_IP> -i 3 -p 5201 -R    # Process 2

linuxdev · August 14, 2023, 3:43pm

I cannot answer. Some information and thoughts though…

Always make sure you test with clocks maxed out. This might mean setting an nvpmodel, followed by jetson_clocks (jetson_clocks maximizes clocks for a given model; the model can be set to reduce power consumption, or to turn on all cores with max power consumption possible).

Second, network devices use a hardware interrupt (IRQ), not software. This means that migrating to another core requires an actual wiring capable of reaching the core the IRQ is to be serviced from. One can schedule (set affinity) for a core which is unreachable, but then the process will migrate back to an available core. I am not positive about this, but I think networking is required to be on CPU0 (along with a number of other hardware devices; take a look at “cat /proc/interrupts”, which is for hardware IRQs). So unless NVIDIA suggests networking can run on other cores (I don’t know the specific wiring), then IRQ balancing won’t work.

WayneWWW · August 15, 2023, 2:20am

Sorry that I don’t quite understand about the situation with your description.

Please try to clarify this first

Please try to validate this on jp5.1.2.
Please try to clarify if this is something that can reproduce only on devkit. I mean I don’t care about the custom board case for now. Just use 2 devkit and are you able to reproduce your problem on devkit?

hermes_wu · August 15, 2023, 3:27am

Hi @linuxdev

We have set the power model to 0 (MAXM) with nvpmodel
And I try to monitor /proc/interrupts of eth, the irq on Jetson AGX Xavier in CPU0, but Jetson AGX Orin in CPU0 ~ CPU7

Jetson AGX Xavier : interrupts, and affinity_hint, and smp_affinity

Jetson AGX Orin : interrupts, and affinity_hint, and smp_affinity

.

hermes_wu · August 15, 2023, 3:31am

Hi @WayneWWW

We have tried JP 5.1 and JP 5.1.1 on the NV devkit(Xavier and Orin), and they can be reproduced.
If the JP 5.1.2 validation on the devkit is necessary, we will try it and reply with the result.

WayneWWW · August 15, 2023, 3:34am

Could you clarify what is your exact method to reproduce this issue on devkit?

hermes_wu · August 15, 2023, 4:52am

HI @WayneWWW

You can refer to the attached picture below, we plugged the 10G ethernet card into the PCIe port of the NV devkits and then connected them with an ethernet cable. We used iperf3 to launch the transfer performance test and use the 1 process, 1 thread command. Finally, the test result is generated as the picture above.

The 10G ethernet card can be the one of below:

Marvell AQC113 ethernet card
Marvell AQC107 ethernet card
We also connected to the MGBE interface on the Orin devkit as a comparison.

The test sets of devkit:

WayneWWW · August 15, 2023, 4:55am

What is the expected result of this 10G card? For example, what is the value you got from x86 ubuntu host ?

What if we don’t have similar cards on our side? Is it okay to use other kind of NIC to test?

hermes_wu · August 15, 2023, 5:12am

Hi @WayneWWW

Based on the response from the ethernet chip vendor, the actual transfer performance can reach 9.x Gbits on x86 and some Broadcom platforms.

Yes, you can use another kind of NIC instead for tests. We just want to clarify the issue is related to the host platform or not, or can it be improved.

hermes_wu · August 15, 2023, 6:20am

Hi @WayneWWW

I tried to change the smp_affinity of ethernet on Orin, and then the ethernet irqs are all used CPU#0 (Which is the same as Xavier). And the transfer performance is down to 4.8x Gbits too.

Based on this situation, I tried to change the smp_affinity of ethernet on Xavier, but I encouraged an error that caused the change to fail.

Is there any other method to change the affinity value of ethernet on the Xavier platform?

WayneWWW · August 15, 2023, 7:57am

Hi,

One question here. When you tested this on Orin, I assume your eth0 is also the PCIe card but not the Orin on board MGBE port, right?

hermes_wu · August 15, 2023, 8:21am

Hi @WayneWWW

Yes, the eth0 we tested on Orin is a PCIe card.

linuxdev · August 15, 2023, 7:40pm

So it looks like Orin has the ability to change affinity on the ethernet hardware, but Xavier and earlier do not. I wonder if NVIDIA can confirm if Orin has a general purpose IO-APIC equivalent, or if it is just specific hardware (like ethernet) which can change affinity?

Or, if this is an external ethernet NIC (on PCIe), then the affinity does not apply (I’m thinking of the integrated NIC on the dev kit).

hermes_wu · August 16, 2023, 6:15am

HI @linuxdev

In my case, Xavier and Orin both used the same ethernet NIC on PCIe(Marvell AQC113). But the affinity can only be changed on the Orin platform.

WayneWWW · August 16, 2023, 8:03am

Hi @hermes_wu

I would suggest you can try to use x86 desktop + this PCIe card first to give out a base line.

It is actually a not very wise way to just say “Based on the response from the ethernet chip vendor, the actual transfer performance can reach 9.x Gbits on x86 and some Broadcom platforms”.

You should run it first.

hermes_wu · August 16, 2023, 8:51am

Hi @WayneWWW

Since my working environment only has notebooks, it is difficult to find a PC with a PCIe slot. But I will try to find it.
In addition, regarding the IRQ balance of Xavier, may any method can let me change the affinity value(or others) to adjust which CPU each interrupt will be executed on?

WayneWWW · August 16, 2023, 8:52am

Hi,

we can change all MSIs affinity to same core, but cannot have different affinity per MSI per controller.

And since your case only has one NIC on the board, I don’t really think that affects the performance much.

hermes_wu · August 16, 2023, 9:49am

Hi @WayneWWW

Can you give me more information on how to change all MSI affinity to other cores for the Xavier platform?

WayneWWW · August 16, 2023, 9:52am

It is also using smp_affinity. Please try other values and see if this is indeed not able to get set.

linuxdev · August 16, 2023, 7:15pm

PCIe is plug-n-play. I suspect that some of the integrated components cannot change which core the hardware IRQ runs on. It seems though that PCIe is the first of some of the hardware to be able to run on different cores for an Orin. I think though that much depends on data transfer within the “CPU<->application”, so it might be CPU bound as well. However, it is unlikely you would be able to IRQ balance hardware drivers on Xavier, this is set in silicon and/or module wiring.

FYI, you can set anything to affinity to some other core. The trick is whether or not the scheduler will migrate back to CPU0.

Topic		Replies	Views
10GbE PCIe Card Behaves like 1GbE Card Jetson AGX Xavier	25	3379	October 18, 2021
PCIe 10gbps throughtput issue Jetson AGX Xavier	20	2376	April 1, 2019
10 Gb Ethernet Card for TX2 Jetson TX2	7	3232	October 18, 2021
IRQ Balancing Jetson AGX Xavier ethernet	17	5202	October 18, 2021
What is the actual maximum speed of Jetson AGX Xavier PCIE Ethernet? Jetson AGX Xavier pcie , ethernet	14	3788	May 5, 2022
xavier pcie ethernet controller problem Jetson AGX Xavier	5	966	October 18, 2021
X86 CPU to Xavier AGX (in endpoint mode) with PCIe : How enable Ethernet over PCIe driver Jetson AGX Xavier pcie , nvbugs	10	2469	October 18, 2021
The max bandwidth of of virtual ethernet over PCIe between two xaviers Jetson AGX Xavier pcie	7	1353	October 18, 2021
very low PCIe bandwidth CUDA Programming and Performance	9	3574	March 2, 2010
The bandwidth of of virtual ethernet over PCIe between two xaviers is low Jetson AGX Xavier	92	10058	October 18, 2021

PCIe 10G ethernet transfer performance far from spec

Related topics