Why video_decode sample experiences significant performance drop in Jetpack 5.1?

Here are the details of the benchmark.(Can anyone reproduce it on Orin?):

devices(MAXN jetson_clocks):

  • AGX Jetpack 5.1 L4T 35.2.1
  • Nano Jetpack 4.6.3 L4T 32.7.3

Command:

  • nvv4l2dec: ffmpeg -y -benchmark -c:v hevc_nvv4l2dec -i $input -f null -
  • gstreamer: gst-launch-1.0 filesrc location=$input ! h265parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
  • 00_video_decode: video_decode H265 --disable-rendering --stats $input
  • jetson_ffmpeg: ffmpeg -y -benchmark -c:v hevc_nvmpi -i $input -f null -

sample_3840x2160.hevc:

Stream #0:0: Video: hevc (Main), yuv420p(tv), 3840x2160, 23.98 fps, 23.98 tbr, 1200k tbn, 23.98 tbc
AGX(fps) Nano(fps)
00_video_decode 32.7 97.96
gstreamer 232.61 97.71
nvv4l2dec 24 63
nv_mpi 28 70
ffmpeg cpu 25 7.3

sample_4k.h264:

Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 3840x2160, 25 fps, 25 tbr, 1200k tbn, 50 tbc
AGX(fps) Nano(fps)
00_video_decode 25.7 88.10
gstreamer 132.84 87.29
nvv4l2dec 25 63
nv_mpi 23 69
ffmpeg cpu 64 19

sample_720.h264:

Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1280x544, 24.08 fps, 23.98 tbr, 1200k tbn, 47.95 tb
AXG(fps) Nano(fps)
00_video_decode 71.18 770.58
gstreamer 733.70 764.39
nvv4l2dec 47 409
nv_mpi 46 490
ffmpeg cpu 676 270

For the same video, when using gstreamer nvv4l2decoder, we obtained 200+fps on AGX and 97fps on Nano, which is normal. However, when using the 00_video_decode sample, we only obtained 32.7fps on AGX but 97fps on Nano!

Version of MMAPI:

$ apt show `dpkg -S /usr/src/jetson_multimedia_api | cut -d ':' -f1`
Package: nvidia-l4t-jetson-multimedia-api
Version: 35.2.1-20230124153320
Priority: standard
Section: Utils
Maintainer: NVIDIA Corporation
Installed-Size: 96.4 MB
Pre-Depends: nvidia-l4t-core (>> 35.2-0), nvidia-l4t-core (<< 35.3-0)
Depends: cuda-cudart-11-4, cuda-cudart-dev-11-4, libc6-dev, libglvnd-dev, libx11-dev, nvidia-l4t-camera (= 35.2.1-20230124153320), nvidia-l4t-multimedi
a (= 35.2.1-20230124153320), nvidia-l4t-multimedia-utils (= 35.2.1-20230124153320)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 75.3 MB
APT-Manual-Installed: no
APT-Sources: https://repo.download.nvidia.com/jetson/common r35.2/main arm64 Packages
Description: NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support flexible application development.

Hi,
For information, do you compare AGX Orin with Jetson Nano in 4K decoding through 00_video_decode and see non-expected performance?

Sorry for the misleading information. I don’t get an AGX-ORIN. AGX ref to AGX Xavier. I’m sure it’ll reproduce on Orin.

@dourokinga

FWIW, I am experiencing similar performance issue on AGX Orin

Notably, when I drop enable-max-performance=1 in gstreamer it performs on the same level as:

  • nvv4l2dec from Nvidia’s Gstreamer build
  • nvmpi of your variant of jetson_ffmpeg

I am testing on 4k H.264

Raw results

Nvidia Jetson FFmpeg build (decoding only)

ffmpeg -y -benchmark -c:v h264_nvv4l2dec -i ~/Downloads/iphone6s_4k.mov -f null -

# ...

frame=  540 fps= 73 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=2.52x 

jetson-ffmpeg mpi build (ported to new API)

./ffmpeg -y -benchmark -c:v h264_nvmpi -i ~/Downloads/iphone6s_4k.mov -f null -

# ...
frame=  549 fps= 78 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=2.65x 

Nvidia Jetson FFmpeg build without specyfing hardware

ffmpeg -y -benchmark -c:v h264 -i ~/Downloads/iphone6s_4k.mov -f null -

# ...

frame=  556 fps=127 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=4.24x

# faster!

Jetson sysem FFmpeg (not Nvidia build) software

ffmpeg -y -benchmark -c:v h264 -i ~/Downloads/iphone6s_4k.mov -f null -

# . 

frame=  556 fps=128 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=4.26x 

gstreamer with hardware decoder

gst-launch-1.0 filesrc location=$input ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

# ...

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 542, dropped: 0, current: 119,99, average: 119,32

gstreamer with hardware decoder not forcing performnace

gst-launch-1.0 filesrc location=$input ! qtdemux ! h264parse ! nvv4l2decoder  ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v

# ...

/GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 548, dropped: 0, current: 72,94, average: 73,98

My 2017 laptop without specifying hardware

ffmpeg -y -benchmark -c:v h264 -i ~/Downloads/iphone6s_4k.mov -f null -'

# ...

frame=  556 fps= 86 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=2.88x

My 2017 laptop with NVDEC

ffmpeg -y -benchmark -hwaccel cuda  -i ~/Downloads/iphone6s_4k.mov -f null -

# ...

frame=  556 fps=174 q=-0.0 Lsize=N/A time=00:00:18.55 bitrate=N/A speed=5.81x 

I confirm loss of performance with AGX Jetpack 5.1 L4T 35.2.1 on AGX Orin 32 GB

My previous tests were made with

dpkg-query --show nvidia-l4t-core
nvidia-l4t-core	34.1.1-20220516211757
apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.0.1-b118

After flashing AGX Jetpack 5.1 L4T 35.2.1

cat /etc/nv_tegra_release 
# R35 (release), REVISION: 2.1, GCID: 32413640, BOARD: t186ref, EABI: aarch64, DATE: Tue Jan 24 23:38:33 UTC 2023
apt-cache show nvidia-l4t-core

Package: nvidia-l4t-core
Version: 35.2.1-20230124153320

Performance dropped from 111 to 39 fps (same hardware)

So far tested with community code (jetson-ffmpeg fork).

But considering tests by @dourokinga I expect problems on some other paths also.

Hi,
For clearness, please create a new topic for AGX Orin. Would like to have this topic specific to AGX Xavier.

And please try Jetpack 5.1.1

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.