Nvethernet driver queue allocation and usage

Hi all,

I’m working on optimising the throughput of the 10GBASE-T1 link between an Orin p3663-a01 board and a separate x86_64 system. This is a 3rd-party dual-Orin system, but the question relates to the nvethernet driver which is just the standard DRIVE version. I’ve come across an odd issue. A simple iperf3 between the Orin and the other host produces ~5gbit/s when sending from the Orin to the host:

$ iperf3 -c 192.168.1.2
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
---
[  5]   0.00-10.00  sec  5.59 GBytes  4.80 Gbits/sec   87             sender
[  5]   0.00-10.00  sec  5.58 GBytes  4.79 Gbits/sec                  receiver

But ~8.5Gbit/s receiving from the host to the Orin.

$ iperf3 -c 192.168.1.2 -R
[ ID] Interval           Transfer     Bitrate         Retr
---
[  5]   0.00-10.00  sec  10.0 GBytes  8.59 Gbits/sec    1             sender
[  5]   0.00-10.00  sec  10.0 GBytes  8.59 Gbits/sec                  receiver

The cause seems to be in the queue allocation. The nvethernet driver creates 10 rx and tx queues to handle incoming frames, but only the first tx queue is being used:

$ ethtool -S mgbe0_0  | grep q_.*_pkt                                                                                                                                                                                                                                      
     q_tx_pkt_n[0]: 57169958
     q_tx_pkt_n[1]: 0
     q_tx_pkt_n[2]: 9
     q_tx_pkt_n[3]: 0
     q_tx_pkt_n[4]: 0
     q_tx_pkt_n[5]: 0
     q_tx_pkt_n[6]: 4082
     q_tx_pkt_n[7]: 22
     q_tx_pkt_n[8]: 0
     q_tx_pkt_n[9]: 0
     q_rx_pkt_n[0]: 5431386
     q_rx_pkt_n[1]: 9402526
     q_rx_pkt_n[2]: 1139638
     q_rx_pkt_n[3]: 25820203
     q_rx_pkt_n[4]: 1210313
     q_rx_pkt_n[5]: 7718919
     q_rx_pkt_n[6]: 16913829
     q_rx_pkt_n[7]: 14053080
     q_rx_pkt_n[8]: 2834541
     q_rx_pkt_n[9]: 18741537

Running parallel iperf3 threads with -P10 doesn’t appear to make any difference to this behaviour. top on an otherwise idle system shows a single CPU saturated with system and interrupt load while transmitting:

top - 16:20:18 up 1 day,  2:31,  2 users,  load average: 0.74, 1.23, 1.33
Tasks: 724 total,   2 running, 722 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us, 19.2 sy,  0.0 ni, 12.9 id,  0.0 wa,  1.2 hi, 66.8 si,  0.0 st
%Cpu1  :  0.0 us,  2.7 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
[other 10 cores idle]

So the tx bottleneck appears to be interrupt overload on a single core from that one queue. Is this expected behaviour from the driver and, if not, do I need to change the driver to make full use of 10 queues?

Some details:

$ethtool -i mgbe0_0
driver: nvethernet
version: 5.10.120-rt70-tegra
firmware-version:
expansion-rom-version:
bus-info: 6810000.ethernet
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$uname -a 
Linux hostname 5.10.120-rt70-tegra #1 SMP PREEMPT RT Fri May 26 11:33:37 CST 2023 aarch64 aarch64 aarch64 GNU/Linux

$ethtool -k mgbe0_0 | grep -v fixed
Features for mgbe0_0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ipv6: on
scatter-gather: on
	tx-scatter-gather: on
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-mangleid-segmentation: off
generic-segmentation-offload: on
generic-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
receive-hashing: on
rx-vlan-filter: on
tx-udp-segmentation: on
tx-nocache-copy: off
rx-gro-list: off

Please provide the following info (tick the boxes after creating this topic):
Software Version
[/] DRIVE OS 6.0.5

Target Operating System
[/] Linux

Hardware Platform
[/] other

SDK Manager Version
[/] other

Host Machine Version
[/] other (Ubuntu 20.04)

Dear @terry.dooher,
P3663 is not supported via forum. Could you please reach out to your NVIDIA representative for right support channel.

Thanks Siva. I’ll take it up with them, but I figured the issue would be common to all boards (at least anything running this driver version) and was hoping to probe the hive mind here :)