Hi Forum,
We are developing a camera recording pipeline making use of ffmpeg (of dual/stereo MIPI-CSI2 camera 728x544@20fps) with H264 encoding by CPU and observed different performance in the following setup (same carrier board + same application code) :
- Jetson Orin NX + Jetpack-5.1.2 + NVME SSD SAMSUNG MZ9LQ1T0HBLB-00B
- Jetson Orin Nano + Jetpack-5.1.5 (as the Jetpack-5.1.2 does not support Orin Nano) + NVME SSD SAMSUNG MZ9LQ1T0HBLB-00B
I got significant high frames queued (btw 10 to 20) and dropped for the 2nd setup, while the first one works perfectly with constant frame queued (1) ~zero frame dropped.
I followed the instruction in the relevant topic to perform the benchmarking and below is the result :
Setup #1:
nvidia@JORNX-JP512:~$ sudo nvme list
[sudo] password for nvidia:
Node SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 S6HXNE0RB15298 SAMSUNG MZ9LQ1T0HBLB-00B 1 273.89 GB / 1.02 TB 512 B + 0 B FXM7AK1Q
nvidia@JORNX-JP512:~/Workspace$ ffmpeg
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
a. Direct write :
root@JORNX-JP512-GY-3C-6D-66-03-99-3F:/home/nvidia# echo 3 > /proc/sys/vm/drop_caches
root@JORNX-JP512-GY-3C-6D-66-03-99-3F:/home/nvidia# fio --filename=/mnt/testfile2 --direct=1 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
test-seq-write: Laying out IO file (1 file / 61440MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=230MiB/s][w=23 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=6102: Wed Oct 15 16:07:34 2025
write: IOPS=28, BW=283MiB/s (297MB/s)(60.0GiB/217005msec); 0 zone resets
clat (msec): min=4, max=351, avg=34.95, stdev=59.79
lat (msec): min=4, max=351, avg=35.32, stdev=59.79
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 5], 10.00th=[ 5], 20.00th=[ 5],
| 30.00th=[ 5], 40.00th=[ 10], 50.00th=[ 10], 60.00th=[ 11],
| 70.00th=[ 12], 80.00th=[ 14], 90.00th=[ 163], 95.00th=[ 167],
| 99.00th=[ 199], 99.50th=[ 234], 99.90th=[ 317], 99.95th=[ 321],
| 99.99th=[ 351]
bw ( KiB/s): min=20480, max=2068480, per=99.99%, avg=289881.38, stdev=483807.87, samples=434
iops : min= 2, max= 202, avg=28.31, stdev=47.25, samples=434
lat (msec) : 10=53.35%, 20=28.27%, 50=0.76%, 100=1.17%, 250=16.00%
lat (msec) : 500=0.44%
cpu : usr=0.83%, sys=0.67%, ctx=6169, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=283MiB/s (297MB/s), 283MiB/s-283MiB/s (297MB/s-297MB/s), io=60.0GiB (64.4GB), run=217005-217005msec
Disk stats (read/write):
nvme0n1: ios=204/50332, merge=345/1329, ticks=15581/988148, in_queue=1004408, util=99.76%
b. Buffered write :
root@JORNX-JP512:/home/nvidia# fio --filename=/mnt/testfile2 --direct=0 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][98.9%][eta 00m:02s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=7123: Wed Oct 15 16:11:21 2025
write: IOPS=32, BW=329MiB/s (345MB/s)(60.0GiB/186471msec); 0 zone resets
clat (msec): min=7, max=9930, avg=30.10, stdev=160.41
lat (msec): min=7, max=9930, avg=30.34, stdev=160.41
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 8], 10.00th=[ 8], 20.00th=[ 9],
| 30.00th=[ 9], 40.00th=[ 10], 50.00th=[ 11], 60.00th=[ 13],
| 70.00th=[ 21], 80.00th=[ 36], 90.00th=[ 65], 95.00th=[ 81],
| 99.00th=[ 159], 99.50th=[ 213], 99.90th=[ 1217], 99.95th=[ 1435],
| 99.99th=[ 9866]
bw ( KiB/s): min=20480, max=1372160, per=100.00%, avg=391056.24, stdev=371245.49, samples=319
iops : min= 2, max= 134, avg=38.14, stdev=36.25, samples=319
lat (msec) : 10=41.60%, 20=28.14%, 50=16.96%, 100=10.07%, 250=2.83%
lat (msec) : 500=0.10%, 750=0.02%, 1000=0.02%, 2000=0.21%, >=2000=0.05%
cpu : usr=0.77%, sys=29.56%, ctx=6927, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=329MiB/s (345MB/s), 329MiB/s-329MiB/s (345MB/s-345MB/s), io=60.0GiB (64.4GB), run=186471-186471msec
Disk stats (read/write):
nvme0n1: ios=417/51035, merge=415/201, ticks=39812/172662697, in_queue=172714077, util=98.00%
Setup #2:
nvidia@JORNN-JP515:~$ sudo nvme list
[sudo] password for nvidia:
Node SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 S6HXNE0RB15314 SAMSUNG MZ9LQ1T0HBLB-00B 1 44.61 GB / 1.02 TB 512 B + 0 B FXM7AK1Q
nvidia@JORNN-JP515:~$ ffmpeg
ffmpeg version n4.2.7-24-gd796def2ea-1ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
configuration: --prefix=/usr --enable-nvv4l2dec --enable-libv4l2 --enable-shared --extra-libs='-L/usr/lib/aarch64-linux-gnu/tegra -lv4l2 -lnvbufsurface -lnvbufsurftransform' --extra-cflags=-I/usr/src/jetson_multimedia_api/include/ --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
a. Direct write :
root@JORNN-JP515:/home/nvidia# echo 3 > /proc/sys/vm/drop_caches
root@JORNN-JP515:/home/nvidia# fio --filename=/mnt/testfile2 --direct=1 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
test-seq-write: Laying out IO file (1 file / 61440MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=820MiB/s][w=82 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=3351: Wed Oct 15 16:15:00 2025
write: IOPS=67, BW=676MiB/s (709MB/s)(60.0GiB/90909msec); 0 zone resets
clat (msec): min=4, max=261, avg=14.32, stdev= 8.66
lat (msec): min=5, max=261, avg=14.79, stdev= 8.66
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 9],
| 30.00th=[ 9], 40.00th=[ 13], 50.00th=[ 13], 60.00th=[ 14],
| 70.00th=[ 17], 80.00th=[ 18], 90.00th=[ 22], 95.00th=[ 25],
| 99.00th=[ 35], 99.50th=[ 40], 99.90th=[ 114], 99.95th=[ 178],
| 99.99th=[ 262]
bw ( KiB/s): min=327680, max=1085440, per=99.73%, avg=690178.99, stdev=226585.96, samples=181
iops : min= 32, max= 106, avg=67.38, stdev=22.13, samples=181
lat (msec) : 10=32.39%, 20=55.79%, 50=11.49%, 100=0.16%, 250=0.15%
lat (msec) : 500=0.02%
cpu : usr=2.30%, sys=0.55%, ctx=6228, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=676MiB/s (709MB/s), 676MiB/s-676MiB/s (709MB/s-709MB/s), io=60.0GiB (64.4GB), run=90909-90909msec
Disk stats (read/write):
nvme0n1: ios=6404/49880, merge=2291/1497, ticks=82335/562014, in_queue=713478, util=99.74%
b. Buffered write :
root@JORNN-JP515:/home/nvidia# fio --filename=/mnt/testfile2 --direct=0 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=660MiB/s][w=66 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=5230: Wed Oct 15 16:18:28 2025
write: IOPS=75, BW=759MiB/s (796MB/s)(60.0GiB/80942msec); 0 zone resets
clat (msec): min=7, max=325, avg=12.87, stdev= 5.64
lat (msec): min=8, max=325, avg=13.17, stdev= 5.64
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 9], 10.00th=[ 10], 20.00th=[ 11],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 14],
| 70.00th=[ 14], 80.00th=[ 15], 90.00th=[ 16], 95.00th=[ 17],
| 99.00th=[ 21], 99.50th=[ 33], 99.90th=[ 66], 99.95th=[ 73],
| 99.99th=[ 326]
bw ( KiB/s): min=327680, max=1167360, per=100.00%, avg=777380.63, stdev=82327.45, samples=161
iops : min= 32, max= 114, avg=75.88, stdev= 8.05, samples=161
lat (msec) : 10=12.08%, 20=86.90%, 50=0.85%, 100=0.15%, 250=0.02%
lat (msec) : 500=0.02%
cpu : usr=2.47%, sys=94.92%, ctx=301, majf=0, minf=211
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=759MiB/s (796MB/s), 759MiB/s-759MiB/s (796MB/s-796MB/s), io=60.0GiB (64.4GB), run=80942-80942msec
Disk stats (read/write):
nvme0n1: ios=126/54779, merge=0/111, ticks=6323/4683266, in_queue=4690641, util=79.55%
It seemed to me that the 2nd setup had better throughput however it encountered high frame(s) queued / dropped. Could it be due to the different version of ffmpeg? Could you share your opinion, please ?
Thanks and best regards,
Khang

