NVME SSD write speed on Jetpack-5.1.5

Hi Forum,

We are developing a camera recording pipeline making use of ffmpeg (of dual/stereo MIPI-CSI2 camera 728x544@20fps) with H264 encoding by CPU and observed different performance in the following setup (same carrier board + same application code) :

  1. Jetson Orin NX + Jetpack-5.1.2 + NVME SSD SAMSUNG MZ9LQ1T0HBLB-00B
  2. Jetson Orin Nano + Jetpack-5.1.5 (as the Jetpack-5.1.2 does not support Orin Nano) + NVME SSD SAMSUNG MZ9LQ1T0HBLB-00B

I got significant high frames queued (btw 10 to 20) and dropped for the 2nd setup, while the first one works perfectly with constant frame queued (1) ~zero frame dropped.

I followed the instruction in the relevant topic to perform the benchmarking and below is the result :

Setup #1:

nvidia@JORNX-JP512:~$ sudo  nvme list
[sudo] password for nvidia:
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev




/dev/nvme0n1     S6HXNE0RB15298       SAMSUNG MZ9LQ1T0HBLB-00B                 1         273.89  GB /   1.02  TB    512   B +  0 B   FXM7AK1Q
nvidia@JORNX-JP512:~/Workspace$ ffmpeg 
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

a. Direct write :

root@JORNX-JP512-GY-3C-6D-66-03-99-3F:/home/nvidia# echo 3 > /proc/sys/vm/drop_caches
root@JORNX-JP512-GY-3C-6D-66-03-99-3F:/home/nvidia# fio --filename=/mnt/testfile2 --direct=1 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
test-seq-write: Laying out IO file (1 file / 61440MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=230MiB/s][w=23 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=6102: Wed Oct 15 16:07:34 2025
write: IOPS=28, BW=283MiB/s (297MB/s)(60.0GiB/217005msec); 0 zone resets
clat (msec): min=4, max=351, avg=34.95, stdev=59.79
lat (msec): min=4, max=351, avg=35.32, stdev=59.79
clat percentiles (msec):
|  1.00th=[    5],  5.00th=[    5], 10.00th=[    5], 20.00th=[    5],
| 30.00th=[    5], 40.00th=[   10], 50.00th=[   10], 60.00th=[   11],
| 70.00th=[   12], 80.00th=[   14], 90.00th=[  163], 95.00th=[  167],
| 99.00th=[  199], 99.50th=[  234], 99.90th=[  317], 99.95th=[  321],
| 99.99th=[  351]
bw (  KiB/s): min=20480, max=2068480, per=99.99%, avg=289881.38, stdev=483807.87, samples=434
iops        : min=    2, max=  202, avg=28.31, stdev=47.25, samples=434
lat (msec)   : 10=53.35%, 20=28.27%, 50=0.76%, 100=1.17%, 250=16.00%
lat (msec)   : 500=0.44%
cpu          : usr=0.83%, sys=0.67%, ctx=6169, majf=0, minf=23
IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=283MiB/s (297MB/s), 283MiB/s-283MiB/s (297MB/s-297MB/s), io=60.0GiB (64.4GB), run=217005-217005msec

Disk stats (read/write):
nvme0n1: ios=204/50332, merge=345/1329, ticks=15581/988148, in_queue=1004408, util=99.76%

b. Buffered write :

root@JORNX-JP512:/home/nvidia# fio --filename=/mnt/testfile2 --direct=0 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][98.9%][eta 00m:02s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=7123: Wed Oct 15 16:11:21 2025
write: IOPS=32, BW=329MiB/s (345MB/s)(60.0GiB/186471msec); 0 zone resets
clat (msec): min=7, max=9930, avg=30.10, stdev=160.41
lat (msec): min=7, max=9930, avg=30.34, stdev=160.41
clat percentiles (msec):
|  1.00th=[    8],  5.00th=[    8], 10.00th=[    8], 20.00th=[    9],
| 30.00th=[    9], 40.00th=[   10], 50.00th=[   11], 60.00th=[   13],
| 70.00th=[   21], 80.00th=[   36], 90.00th=[   65], 95.00th=[   81],
| 99.00th=[  159], 99.50th=[  213], 99.90th=[ 1217], 99.95th=[ 1435],
| 99.99th=[ 9866]
bw (  KiB/s): min=20480, max=1372160, per=100.00%, avg=391056.24, stdev=371245.49, samples=319
iops        : min=    2, max=  134, avg=38.14, stdev=36.25, samples=319
lat (msec)   : 10=41.60%, 20=28.14%, 50=16.96%, 100=10.07%, 250=2.83%
lat (msec)   : 500=0.10%, 750=0.02%, 1000=0.02%, 2000=0.21%, >=2000=0.05%
cpu          : usr=0.77%, sys=29.56%, ctx=6927, majf=0, minf=23
IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=329MiB/s (345MB/s), 329MiB/s-329MiB/s (345MB/s-345MB/s), io=60.0GiB (64.4GB), run=186471-186471msec

Disk stats (read/write):
nvme0n1: ios=417/51035, merge=415/201, ticks=39812/172662697, in_queue=172714077, util=98.00%

Setup #2:

nvidia@JORNN-JP515:~$ sudo nvme list
[sudo] password for nvidia:
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev




/dev/nvme0n1     S6HXNE0RB15314       SAMSUNG MZ9LQ1T0HBLB-00B                 1          44.61  GB /   1.02  TB    512   B +  0 B   FXM7AK1Q


nvidia@JORNN-JP515:~$ ffmpeg 
ffmpeg version n4.2.7-24-gd796def2ea-1ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
  configuration: --prefix=/usr --enable-nvv4l2dec --enable-libv4l2 --enable-shared --extra-libs='-L/usr/lib/aarch64-linux-gnu/tegra -lv4l2 -lnvbufsurface -lnvbufsurftransform' --extra-cflags=-I/usr/src/jetson_multimedia_api/include/ --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --arch=arm64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

a. Direct write :

root@JORNN-JP515:/home/nvidia# echo 3 > /proc/sys/vm/drop_caches
root@JORNN-JP515:/home/nvidia# fio --filename=/mnt/testfile2 --direct=1 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
test-seq-write: Laying out IO file (1 file / 61440MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=820MiB/s][w=82 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=3351: Wed Oct 15 16:15:00 2025
write: IOPS=67, BW=676MiB/s (709MB/s)(60.0GiB/90909msec); 0 zone resets
clat (msec): min=4, max=261, avg=14.32, stdev= 8.66
lat (msec): min=5, max=261, avg=14.79, stdev= 8.66
clat percentiles (msec):
|  1.00th=[    9],  5.00th=[    9], 10.00th=[    9], 20.00th=[    9],
| 30.00th=[    9], 40.00th=[   13], 50.00th=[   13], 60.00th=[   14],
| 70.00th=[   17], 80.00th=[   18], 90.00th=[   22], 95.00th=[   25],
| 99.00th=[   35], 99.50th=[   40], 99.90th=[  114], 99.95th=[  178],
| 99.99th=[  262]
bw (  KiB/s): min=327680, max=1085440, per=99.73%, avg=690178.99, stdev=226585.96, samples=181
iops        : min=   32, max=  106, avg=67.38, stdev=22.13, samples=181
lat (msec)   : 10=32.39%, 20=55.79%, 50=11.49%, 100=0.16%, 250=0.15%
lat (msec)   : 500=0.02%
cpu          : usr=2.30%, sys=0.55%, ctx=6228, majf=0, minf=23
IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=676MiB/s (709MB/s), 676MiB/s-676MiB/s (709MB/s-709MB/s), io=60.0GiB (64.4GB), run=90909-90909msec

Disk stats (read/write):
nvme0n1: ios=6404/49880, merge=2291/1497, ticks=82335/562014, in_queue=713478, util=99.74%

b. Buffered write :

root@JORNN-JP515:/home/nvidia# fio --filename=/mnt/testfile2 --direct=0 --rw=write --bs=10m --iodepth=64 --size=60G --numjobs=1 --time_base=1 --group_reporting --name=test-seq-write --ioengine=sync
fio: time_based requires a runtime/timeout setting
test-seq-write: (g=0): rw=write, bs=(R) 10.0MiB-10.0MiB, (W) 10.0MiB-10.0MiB, (T) 10.0MiB-10.0MiB, ioengine=sync, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=660MiB/s][w=66 IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=1): err= 0: pid=5230: Wed Oct 15 16:18:28 2025
write: IOPS=75, BW=759MiB/s (796MB/s)(60.0GiB/80942msec); 0 zone resets
clat (msec): min=7, max=325, avg=12.87, stdev= 5.64
lat (msec): min=8, max=325, avg=13.17, stdev= 5.64
clat percentiles (msec):
|  1.00th=[    9],  5.00th=[    9], 10.00th=[   10], 20.00th=[   11],
| 30.00th=[   12], 40.00th=[   12], 50.00th=[   13], 60.00th=[   14],
| 70.00th=[   14], 80.00th=[   15], 90.00th=[   16], 95.00th=[   17],
| 99.00th=[   21], 99.50th=[   33], 99.90th=[   66], 99.95th=[   73],
| 99.99th=[  326]
bw (  KiB/s): min=327680, max=1167360, per=100.00%, avg=777380.63, stdev=82327.45, samples=161
iops        : min=   32, max=  114, avg=75.88, stdev= 8.05, samples=161
lat (msec)   : 10=12.08%, 20=86.90%, 50=0.85%, 100=0.15%, 250=0.02%
lat (msec)   : 500=0.02%
cpu          : usr=2.47%, sys=94.92%, ctx=301, majf=0, minf=211
IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6144,0,0 short=0,0,0,0 dropped=0,0,0,0
latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: bw=759MiB/s (796MB/s), 759MiB/s-759MiB/s (796MB/s-796MB/s), io=60.0GiB (64.4GB), run=80942-80942msec

Disk stats (read/write):
nvme0n1: ios=126/54779, merge=0/111, ticks=6323/4683266, in_queue=4690641, util=79.55%

It seemed to me that the 2nd setup had better throughput however it encountered high frame(s) queued / dropped. Could it be due to the different version of ffmpeg? Could you share your opinion, please ?

Thanks and best regards,
Khang

Hi,
It looks related to CPU capability between Orin NX and Orin Nano. Would suggest run the commands on the two setups:

$ sudo jetson_clocks
$ sudo tegrastats

To profile system status.

Hi @DaneLLL ,

jetson_clocks is enabled from system startup for both setup :

The only difference is that with Jetpack-5.1.5, I was able to enable the Super Mode and selected the MAXN_SUPER for the Jetson Orin Nano, while the Orin NX was still with 25W NVP Mode.

For tegrastats, did you mean that I would run the command and capture the output during the recording?

By the way, I checked the ffmpeg version (which is the most suspicious element according to myself) in both setups, they have similar version but not the same :

Setup #1 :

nvidia@JORNX-JP512:~$ dpkg -l | grep ffmpeg
ii ffmpeg 7:4.2.7-0ubuntu0.1 arm64 Tools for transcoding, streaming and playing of multimedia files

nvidia@JORNX-JP512:~$ apt-get download --print-uris ffmpeg
‘http://ports.ubuntu.com/ubuntu-ports/pool/universe/f/ffmpeg/ffmpeg_4.2.7-0ubuntu0.1_arm64.deb’ ffmpeg_7%3a4.2.7-0ubuntu0.1_arm64.deb 1439820 SHA512:ce0424f95a5b34cf3f803d4c136bc8f2de35aa1fc508fe11a9877f1309e484600234a60f948037c3dd94e01cd4937295ee427fa68c81f659be401dd71a5bff24

Setup #2 :

nvidia@JORNN-JP515:~$ dpkg -l | grep ffmpeg
ii ffmpeg 7:4.2.7-nvidia arm64 Tools for transcoding, streaming and playing of multimedia files

nvidia@JORNN-JP515:~$ apt-get download --print-uris ffmpeg
‘https://repo.download.nvidia.com/jetson/ffmpeg/pool/main/f/ffmpeg/ffmpeg_4.2.7-nvidia_arm64.deb’ ffmpeg_7%3a4.2.7-nvidia_arm64.deb 14320024 SHA256:8502e4ce2d83d25753d72bd0ebd3fec89a9b9c43b7a7dff4277b0d7ecd1d7d57

Best Regards,
Khang

Hi @DaneLLL ,

I tested and confirmed that it was due to the https://repo.download.nvidia.com/jetson/ffmpeg/pool/main/f/ffmpeg/ffmpeg_4.2.7-nvidia_arm64.deb, I disabled this version of ffmpeg in /etc/apt/sources.list.d/nvidia-l4t-apt-source.list and re-install from http://ports.ubuntu.com/ubuntu-ports/pool/universe/f/ffmpeg/ffmpeg_4.2.7-0ubuntu0.1_arm64.deb and this resolved the issue of queued frames in our context/application, just similar as the case of Jetpack-5.1.2.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.