Nvethernet driver queue allocation and usage

terry.dooher · July 11, 2024, 1:43pm

Hi all,

I’m working on optimising the throughput of the 10GBASE-T1 link between an Orin p3663-a01 board and a separate x86_64 system. This is a 3rd-party dual-Orin system, but the question relates to the nvethernet driver which is just the standard DRIVE version. I’ve come across an odd issue. A simple iperf3 between the Orin and the other host produces ~5gbit/s when sending from the Orin to the host:

$ iperf3 -c 192.168.1.2
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
---
[  5]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec   87             sender
[  5]   0.00-10.00  sec  5.58 GBytes  4.79 Gbits/sec                  receiver

But ~8.5Gbit/s receiving from the host to the Orin.

$ iperf3 -c 192.168.1.2 -R
[ ID] Interval           Transfer     Bitrate         Retr
---
[  5]   0.00-10.00  sec  10.0 GBytes  8.59 Gbits/sec    1             sender
[  5]   0.00-10.00  sec  10.0 GBytes  8.59 Gbits/sec                  receiver

The cause seems to be in the queue allocation. The nvethernet driver creates 10 rx and tx queues to handle incoming frames, but only the first tx queue is being used:

$ ethtool -S mgbe0_0  | grep q_.*_pkt                                                                                                                                                                                                                                      
     q_tx_pkt_n[0]: 57169958
     q_tx_pkt_n[1]: 0
     q_tx_pkt_n[2]: 9
     q_tx_pkt_n[3]: 0
     q_tx_pkt_n[4]: 0
     q_tx_pkt_n[5]: 0
     q_tx_pkt_n[6]: 4082
     q_tx_pkt_n[7]: 22
     q_tx_pkt_n[8]: 0
     q_tx_pkt_n[9]: 0
     q_rx_pkt_n[0]: 5431386
     q_rx_pkt_n[1]: 9402526
     q_rx_pkt_n[2]: 1139638
     q_rx_pkt_n[3]: 25820203
     q_rx_pkt_n[4]: 1210313
     q_rx_pkt_n[5]: 7718919
     q_rx_pkt_n[6]: 16913829
     q_rx_pkt_n[7]: 14053080
     q_rx_pkt_n[8]: 2834541
     q_rx_pkt_n[9]: 18741537

Running parallel iperf3 threads with -P10 doesn’t appear to make any difference to this behaviour. top on an otherwise idle system shows a single CPU saturated with system and interrupt load while transmitting:

top - 16:20:18 up 1 day,  2:31,  2 users,  load average: 0.74, 1.23, 1.33
Tasks: 724 total,   2 running, 722 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us, 19.2 sy,  0.0 ni, 12.9 id,  0.0 wa,  1.2 hi, 66.8 si,  0.0 st
%Cpu1  :  0.0 us,  2.7 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
[other 10 cores idle]

So the tx bottleneck appears to be interrupt overload on a single core from that one queue. Is this expected behaviour from the driver and, if not, do I need to change the driver to make full use of 10 queues?

Some details:

$ethtool -i mgbe0_0
driver: nvethernet
version: 5.10.120-rt70-tegra
firmware-version:
expansion-rom-version:
bus-info: 6810000.ethernet
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$uname -a 
Linux hostname 5.10.120-rt70-tegra #1 SMP PREEMPT RT Fri May 26 11:33:37 CST 2023 aarch64 aarch64 aarch64 GNU/Linux

$ethtool -k mgbe0_0 | grep -v fixed
Features for mgbe0_0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ipv6: on
scatter-gather: on
	tx-scatter-gather: on
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-mangleid-segmentation: off
generic-segmentation-offload: on
generic-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
receive-hashing: on
rx-vlan-filter: on
tx-udp-segmentation: on
tx-nocache-copy: off
rx-gro-list: off

Please provide the following info (tick the boxes after creating this topic):
Software Version
[/] DRIVE OS 6.0.5

Target Operating System
[/] Linux

Hardware Platform
[/] other

SDK Manager Version
[/] other

Host Machine Version
[/] other (Ubuntu 20.04)

SivaRamaKrishnaNV · July 11, 2024, 1:46pm

Dear @terry.dooher,
P3663 is not supported via forum. Could you please reach out to your NVIDIA representative for right support channel.

terry.dooher · July 12, 2024, 2:22pm

Thanks Siva. I’ll take it up with them, but I figured the issue would be common to all boards (at least anything running this driver version) and was hoping to probe the hive mind here :)

system · July 30, 2024, 4:18pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson AGX Orin: Low speeds and high packet loss on MGBE 5G Mode Jetson AGX Orin ethernet	7	591	December 4, 2023
AGX Orin ethernet receive speed can not up to 10Gbps Jetson AGX Orin ethernet	5	41	March 26, 2025
Reduced bandwidth at 10Gbps on Orin and mgbe_payload_cs_err correlation Jetson AGX Orin networking	6	1147	September 13, 2023
Orin Unable to Reach 10 Gbps Networking Jetson AGX Orin ethernet , networking	9	2244	September 8, 2023
10G ethernet testing of Jetson AGX Orin Developer Kit Jetson AGX Orin nvbugs , ethernet	4	3524	November 8, 2022
Packet Drop on Orin mgbe0/1 Jetson AGX Orin ethernet	15	1379	January 25, 2023
Eth0 missing after update to R35.3.1 Jetson AGX Orin ethernet	8	997	July 19, 2023
Jetson ORIN NX iperf3 zero-copy Jetson Orin NX networking	2	410	March 15, 2024
High CPU interrupts limit 10G port's performance to ~5.3Gb/s Jetson AGX Orin ethernet	7	1891	June 15, 2022
PCIe 10G ethernet transfer performance far from spec Jetson AGX Xavier ethernet-adapter-cards	22	1146	August 17, 2023

Nvethernet driver queue allocation and usage

Related topics