Jetson ORIN NX iperf3 zero-copy

vangogh · March 14, 2024, 9:55pm

Hello,
I have a Jetson Orin-NX Developer Kit running 5.10.120-tegra. I have a 25Gbe NVIDIA CX4-LX on the NX. I have attempted a couple of iperf3 TCP tests - buffered and zero-copy (-Z option).

I am seeing odd results with iperf3’s -Z option on the NX. Below are the observed results.

Buffered

iperf3 -c -t 10 -i 5 -P 1 -f M

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 28.7 GBytes 2938 MBytes/sec 0 sender
[ 5] 0.00-10.00 sec 28.7 GBytes 2938 MBytes/sec receiver

Zero-copy

iperf3 -c -t 10 -i 5 -P 1 -f M -Z

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 13.4 GBytes 1374 MBytes/sec 0 sender
[ 5] 0.00-10.00 sec 13.4 GBytes 1374 MBytes/sec receiver

The -Z option results in using splice and sendfile in the kernel which is not observed with a buffered send. For sanity I tested iperf3 from a x86_64 to the NX which performed well ~2828 MBps.

I have uploaded CPU flamegraphs of 10 sec periods while the NX is sending in buffered and in zerocopy mode. My understanding of the zerocopy flamegraph is that it appears the CPU is spending a a lot of time oddly in _raw_spinlock_unlock_irqrestore via the arm_smmu_tlb_sync_context.

iperf3-nx-tcp-buffered-send.pdf (656.3 KB)

iperf3-nx-zerocopy-tcp-send.pdf (611.4 KB)

I wanted to see if anybody has experienced this behavior and what approach was used to investigate/resolve the behavior.

Thanks,
vangogh

DaneLLL · March 15, 2024, 3:02am

Hi,
We run the commands for profiling:

iperf3 -c <ip> -b 0 -l 16K -t 120 -i 1

And do not set -Z . Please try the command and see if you can achieve target performance. And please execute sudo nvpmodel -m 0 and sudo jetson_clocks before the profiling.

vangogh · March 15, 2024, 4:23pm

Hi DaneLLL,
Thank you for your response.

In my NX development kit test configuration nvpmodel -m 0 and jetson_clock are already performed. I am able to get ~2717 MBps for 16K buffered TCP sends, which looks okay.

Under the same configuration I am observing low rates when the Jetson Developer Kit operates as a NFS server and is performing a TCP transmit in response to a remote NFS client performing a READ. To remove Storage as potential cause the NFS server exports a ramdisk. I have NFS 4.2 at both ends. Similar results were observed with NFS 3.

NFS server

mount -t tmpfs -o size=12G none /mnt/nfs
fallocate -l 12G /mnt/nfs/testfile
systemctl restart nfs-kernel-server

I have a NFS client on x86_64 that mounts the NFS export as;
mount -tnfs -o nconnect=8 :/mnt/nfs /mnt/nv_nfs

The NFS client uses “fio” to perform a READ of a testfile that is in the server’s exported file-system.

NFS READ command
fio --name=fio_test --filename=/mnt/nv_nfs/testfile --rw=read --direct=1 --size=100% --bs=1M --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 --time_based --group_reporting

fio_test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
…
fio-3.16
Starting 8 processes
Jobs: 8 (f=8): [R(8)][100.0%][r=1604MiB/s][r=1603 IOPS][eta 00m:00s]
fio_test: (groupid=0, jobs=8): err= 0: pid=14423: Fri Mar 15 08:47:54 2024
read: IOPS=1694, BW=1695MiB/s (1777MB/s)(99.9GiB/60324msec)
slat (usec): min=44, max=12465, avg=110.93, stdev=106.21
clat (msec): min=30, max=683, avg=301.74, stdev=27.47
lat (msec): min=30, max=683, avg=301.85, stdev=27.44
clat percentiles (msec):
| 1.00th=[ 268], 5.00th=[ 275], 10.00th=[ 279], 20.00th=[ 288],
| 30.00th=[ 292], 40.00th=[ 296], 50.00th=[ 300], 60.00th=[ 305],
| 70.00th=[ 309], 80.00th=[ 317], 90.00th=[ 326], 95.00th=[ 338],
| 99.00th=[ 376], 99.50th=[ 405], 99.90th=[ 584], 99.95th=[ 617],
| 99.99th=[ 659]
bw ( MiB/s): min= 1540, max= 1810, per=100.00%, avg=1695.39, stdev= 6.25, samples=960
iops : min= 1540, max= 1810, avg=1695.13, stdev= 6.25, samples=960
lat (msec) : 50=0.08%, 100=0.10%, 250=0.27%, 500=99.32%, 750=0.24%
cpu : usr=0.23%, sys=2.48%, ctx=104256, majf=0, minf=131163
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=102247,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=1695MiB/s (1777MB/s), 1695MiB/s-1695MiB/s (1777MB/s-1777MB/s), io=99.9GiB (107GB), run=60324-60324msec

Here the nconnect=8 mount option provides a boost of ~400 MBps. Without nconnect NFS uses a single socket and the rate observed is ~1400 MBps. Oddly, this rate is very similar to that of the TCP transmit rate observed using iperf3 -Z. At this point a plausible explanation appears to be that the NFS server is taking a kernel path that is similar to that produced by iperf3 -Z (zero-copy) test and therefore experiencing a similar limitation on Orin NX Development Kit platform.

I wanted to see if anybody is observing a similar behavior or had any thoughts regarding the NFS test and whether there was a mechanism to alleviate such behavior.

Thank you.

Topic		Replies	Views
Measuring Jetson TX2 memory controller throughput Jetson TX2	13	1777	October 18, 2021
Reduced bandwidth at 10Gbps on Orin and mgbe_payload_cs_err correlation Jetson AGX Orin networking	6	1147	September 13, 2023
Kernel panic when the pcie connecting with x86 and jetson orin nano Jetson Orin Nano pcie	6	113	July 31, 2024
Port ethernet performance Jetson TX2	19	2004	October 18, 2021
High CPU usage on Jetson TX2 with GigE fully loaded Jetson TX2 hw , kernel , performance	12	2516	October 18, 2021
Package loss on bandwidth reduced network - AGX Orin Jetson AGX Orin ethernet , networking	20	1899	April 11, 2023
Unable to flash Orin NX with SDK Manager on DevKit Jetson Orin NX reflash	4	54	December 6, 2024
10G ethernet testing of Jetson AGX Orin Developer Kit Jetson AGX Orin nvbugs , ethernet	4	3526	November 8, 2022
"iperf -d" test eth0 ethernet speed slowly Jetson Xavier NX board-design , ethernet	4	392	August 1, 2023
“NETDEV WATCHDOG: eth0 (r8168) error Jetson Orin NX ethernet	11	840	June 4, 2024

Jetson ORIN NX iperf3 zero-copy

Buffered

Zero-copy

NFS server

Related topics