Hi DaneLLL,
Thank you for your response.
In my NX development kit test configuration nvpmodel -m 0 and jetson_clock are already performed. I am able to get ~2717 MBps for 16K buffered TCP sends, which looks okay.
Under the same configuration I am observing low rates when the Jetson Developer Kit operates as a NFS server and is performing a TCP transmit in response to a remote NFS client performing a READ. To remove Storage as potential cause the NFS server exports a ramdisk. I have NFS 4.2 at both ends. Similar results were observed with NFS 3.
NFS server
mount -t tmpfs -o size=12G none /mnt/nfs
fallocate -l 12G /mnt/nfs/testfile
systemctl restart nfs-kernel-server
I have a NFS client on x86_64 that mounts the NFS export as;
mount -tnfs -o nconnect=8 :/mnt/nfs /mnt/nv_nfs
The NFS client uses “fio” to perform a READ of a testfile that is in the server’s exported file-system.
NFS READ command
fio --name=fio_test --filename=/mnt/nv_nfs/testfile --rw=read --direct=1 --size=100% --bs=1M --ioengine=libaio --iodepth=64 --runtime=60 --numjobs=8 --time_based --group_reporting
fio_test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
…
fio-3.16
Starting 8 processes
Jobs: 8 (f=8): [R(8)][100.0%][r=1604MiB/s][r=1603 IOPS][eta 00m:00s]
fio_test: (groupid=0, jobs=8): err= 0: pid=14423: Fri Mar 15 08:47:54 2024
read: IOPS=1694, BW=1695MiB/s (1777MB/s)(99.9GiB/60324msec)
slat (usec): min=44, max=12465, avg=110.93, stdev=106.21
clat (msec): min=30, max=683, avg=301.74, stdev=27.47
lat (msec): min=30, max=683, avg=301.85, stdev=27.44
clat percentiles (msec):
| 1.00th=[ 268], 5.00th=[ 275], 10.00th=[ 279], 20.00th=[ 288],
| 30.00th=[ 292], 40.00th=[ 296], 50.00th=[ 300], 60.00th=[ 305],
| 70.00th=[ 309], 80.00th=[ 317], 90.00th=[ 326], 95.00th=[ 338],
| 99.00th=[ 376], 99.50th=[ 405], 99.90th=[ 584], 99.95th=[ 617],
| 99.99th=[ 659]
bw ( MiB/s): min= 1540, max= 1810, per=100.00%, avg=1695.39, stdev= 6.25, samples=960
iops : min= 1540, max= 1810, avg=1695.13, stdev= 6.25, samples=960
lat (msec) : 50=0.08%, 100=0.10%, 250=0.27%, 500=99.32%, 750=0.24%
cpu : usr=0.23%, sys=2.48%, ctx=104256, majf=0, minf=131163
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=102247,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=1695MiB/s (1777MB/s), 1695MiB/s-1695MiB/s (1777MB/s-1777MB/s), io=99.9GiB (107GB), run=60324-60324msec
Here the nconnect=8 mount option provides a boost of ~400 MBps. Without nconnect NFS uses a single socket and the rate observed is ~1400 MBps. Oddly, this rate is very similar to that of the TCP transmit rate observed using iperf3 -Z. At this point a plausible explanation appears to be that the NFS server is taking a kernel path that is similar to that produced by iperf3 -Z (zero-copy) test and therefore experiencing a similar limitation on Orin NX Development Kit platform.
I wanted to see if anybody is observing a similar behavior or had any thoughts regarding the NFS test and whether there was a mechanism to alleviate such behavior.
Thank you.