PCIe 10G ethernet transfer performance far from spec


We conduct 10G ethernet test on Nvidia EVB and our own design platform with NV SOM as attachment.
According to the test results, the transfer speed is far below from 10G spec, at around 5 or 6G only.
Our 10G IC solution partner suggests us to check with NV, whether NV has any setting/configuration related to “IRQ balance” could be finetuned to achieve better result (at least 8 or 9G preferred)


SDK Version : Jetpack 5.1
PCIe 10G Ethernet controller on NV EVB : Marvell AQC113 & Marvell AQC107 & Orin MGBE
PCIe 10G Ethernet controller on our design board : Marvell AQC113

Test on NV EVB(only 1 process, 1 thread):

Test on our own design board:

Test Command:

1 process, 1 thread:
  server - 
    iperf3 -s -i 3 -p 5200

  client - 
    iperf3 -c <Server_IP> -i 3 -p 5200    # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -R    # Rx Test

1 process, 4 threads:
  server - 
    iperf3 -s -i 3 -p 5200

  client - 
    iperf3 -c <Server_IP> -i 3 -p 5200 -P 4   # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -P 4 -R    # Rx Test

2 processes, 2 threads:
  server - 
    iperf3 -s -i 3 -p 5200    # Process 1
    iperf3 -s -i 3 -p 5201    # Process 2

  client - 
    # Tx Test
    iperf3 -c <Server_IP> -i 3 -p 5200    # Process 1
    iperf3 -c <Server_IP> -i 3 -p 5201    # Process 2

    # Rx Test
    iperf3 -c <Server_IP> -i 3 -p 5200 -R    # Process 1
    iperf3 -c <Server_IP> -i 3 -p 5201 -R    # Process 2

I cannot answer. Some information and thoughts though…

Always make sure you test with clocks maxed out. This might mean setting an nvpmodel, followed by jetson_clocks (jetson_clocks maximizes clocks for a given model; the model can be set to reduce power consumption, or to turn on all cores with max power consumption possible).

Second, network devices use a hardware interrupt (IRQ), not software. This means that migrating to another core requires an actual wiring capable of reaching the core the IRQ is to be serviced from. One can schedule (set affinity) for a core which is unreachable, but then the process will migrate back to an available core. I am not positive about this, but I think networking is required to be on CPU0 (along with a number of other hardware devices; take a look at “cat /proc/interrupts”, which is for hardware IRQs). So unless NVIDIA suggests networking can run on other cores (I don’t know the specific wiring), then IRQ balancing won’t work.

1 Like

Sorry that I don’t quite understand about the situation with your description.

Please try to clarify this first

  1. Please try to validate this on jp5.1.2.

  2. Please try to clarify if this is something that can reproduce only on devkit. I mean I don’t care about the custom board case for now. Just use 2 devkit and are you able to reproduce your problem on devkit?

Hi @linuxdev

We have set the power model to 0 (MAXM) with nvpmodel
And I try to monitor /proc/interrupts of eth, the irq on Jetson AGX Xavier in CPU0, but Jetson AGX Orin in CPU0 ~ CPU7

Jetson AGX Xavier : interrupts, and affinity_hint, and smp_affinity

Jetson AGX Orin : interrupts, and affinity_hint, and smp_affinity


Hi @WayneWWW

We have tried JP 5.1 and JP 5.1.1 on the NV devkit(Xavier and Orin), and they can be reproduced.
If the JP 5.1.2 validation on the devkit is necessary, we will try it and reply with the result.

Could you clarify what is your exact method to reproduce this issue on devkit?

HI @WayneWWW

You can refer to the attached picture below, we plugged the 10G ethernet card into the PCIe port of the NV devkits and then connected them with an ethernet cable. We used iperf3 to launch the transfer performance test and use the 1 process, 1 thread command. Finally, the test result is generated as the picture above.

The 10G ethernet card can be the one of below:

  • Marvell AQC113 ethernet card
  • Marvell AQC107 ethernet card
  • We also connected to the MGBE interface on the Orin devkit as a comparison.

The test sets of devkit:

What is the expected result of this 10G card? For example, what is the value you got from x86 ubuntu host ?

What if we don’t have similar cards on our side? Is it okay to use other kind of NIC to test?

Hi @WayneWWW

Based on the response from the ethernet chip vendor, the actual transfer performance can reach 9.x Gbits on x86 and some Broadcom platforms.

Yes, you can use another kind of NIC instead for tests. We just want to clarify the issue is related to the host platform or not, or can it be improved.

Hi @WayneWWW

I tried to change the smp_affinity of ethernet on Orin, and then the ethernet irqs are all used CPU#0 (Which is the same as Xavier). And the transfer performance is down to 4.8x Gbits too.

Based on this situation, I tried to change the smp_affinity of ethernet on Xavier, but I encouraged an error that caused the change to fail.

Is there any other method to change the affinity value of ethernet on the Xavier platform?


One question here. When you tested this on Orin, I assume your eth0 is also the PCIe card but not the Orin on board MGBE port, right?

Hi @WayneWWW

Yes, the eth0 we tested on Orin is a PCIe card.

So it looks like Orin has the ability to change affinity on the ethernet hardware, but Xavier and earlier do not. I wonder if NVIDIA can confirm if Orin has a general purpose IO-APIC equivalent, or if it is just specific hardware (like ethernet) which can change affinity?

Or, if this is an external ethernet NIC (on PCIe), then the affinity does not apply (I’m thinking of the integrated NIC on the dev kit).

HI @linuxdev

In my case, Xavier and Orin both used the same ethernet NIC on PCIe(Marvell AQC113). But the affinity can only be changed on the Orin platform.

Hi @hermes_wu

I would suggest you can try to use x86 desktop + this PCIe card first to give out a base line.

It is actually a not very wise way to just say “Based on the response from the ethernet chip vendor, the actual transfer performance can reach 9.x Gbits on x86 and some Broadcom platforms”.

You should run it first.

Hi @WayneWWW

Since my working environment only has notebooks, it is difficult to find a PC with a PCIe slot. But I will try to find it.
In addition, regarding the IRQ balance of Xavier, may any method can let me change the affinity value(or others) to adjust which CPU each interrupt will be executed on?


we can change all MSIs affinity to same core, but cannot have different affinity per MSI per controller.

And since your case only has one NIC on the board, I don’t really think that affects the performance much.

Hi @WayneWWW

Can you give me more information on how to change all MSI affinity to other cores for the Xavier platform?

It is also using smp_affinity. Please try other values and see if this is indeed not able to get set.

PCIe is plug-n-play. I suspect that some of the integrated components cannot change which core the hardware IRQ runs on. It seems though that PCIe is the first of some of the hardware to be able to run on different cores for an Orin. I think though that much depends on data transfer within the “CPU<->application”, so it might be CPU bound as well. However, it is unlikely you would be able to IRQ balance hardware drivers on Xavier, this is set in silicon and/or module wiring.

FYI, you can set anything to affinity to some other core. The trick is whether or not the scheduler will migrate back to CPU0.

1 Like