Decoder performance: gstreamer (nvv4l2decoder) vs ffmpeg (h264_nvv4l2dec)

DmitryK · October 16, 2021, 8:11pm

Hi!

What magic tricks or settings allow gstreamer’s nvv4l2decoder outperform ffmpeg’s h264_nvv4l2dec more than 2x in h264 1080p decoding?

The tests:

gst-launch-1.0 filesrc location= jellyfish-5-mbps-hd-h264.mkv ! matroskademux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
- 260+ fps
ffmpeg -c:v h264_nvv4l2dec -i jellyfish-5-mbps-hd-h264.mkv -c:v rawvideo -f null -
- 120+ fps
ffmpeg -c:v h264 -threads 4 -i jellyfish-5-mbps-hd-h264.mkv -c:v rawvideo -f null -
- 110+ fps (software decoder!)

And some observations:

4 cortex-a57 cores able to achieve almost the same decoding speed as NVDEC DSP with an ffmpeg implementation
two ffmpeg decoding processes may be started simultaneously and fps won’t drop: each will still be decoding 120fps
- this means there are many hardware resources remains available when only one ffmpeg process is running
changing an output pixel format from YUV420 to NV12 increases a ffmpeg’s h264_nvv4l2dec performance a lot, but still far from a gstreamer:
- nvv4l2dec_create_decoder(avctx, nv_codec_type, V4L2_PIX_FMT_NV12M /*V4L2_PIX_FMT_YUV420M*/);
- after that the ffmpeg’s decoding framerate jumps to 160+fps
enabling V4L2_CID_MPEG_VIDEO_MAX_PERFORMANCE does nothing
- ret = set_ext_controls(ctx->fd, V4L2_CID_MPEG_VIDEO_MAX_PERFORMANCE, 1);
- framerate is the same
- disabling options enable-max-performance in a gstreamer also doesn’t change it’s performance
there is GitHub - jocover/jetson-ffmpeg: ffmpeg support on jetson nano implementation of a codec with the jetson multimedia API
- it uses NvVideoDecoder class from Video Decoder API
- performance slightly less than the h264_nvv4l2dec but near
maximizing jetson nano performance with ‘nvpmodel’ and ‘jetson_clocks’ increases performance for all implementations but ffmpeg’s h264_nvv4l2dec still remains 2 times slower than gstreamer’s nvv4l2decoder

Right now I think the main culprit is NvBufferTransform. If removed, the framerate increases to similar as with a gstreamer. Naturally a mere disablement of NvBufferTransform completely discards output of decoder, therefore is useless by itself. But I’ve tried to change the code in nvv4l2_dec.c so it extracts data from a source buffers instead of a destination buffer. Like this:

NvBuffer2Raw(decoded_buffer->planes[0].fd, 0, parm.width[0], parm.height[0], ...

Strangely after that the framerate dropped dramatically to about 60fps. That I don’t understand.

Actual questions:

Which nuances of the gstreamer implementation allows it to achieve 260fps in h264 decoding?
- What can be done to the ffmpeg h264_nvv4l2dec implementation to achieve performance similar to the gstreamer?
Is it possible to drop NvBufferTransform in favor of retrieving frame data directly from source buffers?
- Why in your opinion my attempt to retrieve data directly from source buffer instead of NvBufferTransform has dropped performance?

Thank you in advance.

DaneLLL · October 18, 2021, 3:14am

Hi,
The hardware-decoded frame data is in NvBuffer and for working with ffmpeg frameworks, have to copy frame data from NvBuffer to CPU buffer. For optimal solution, we would suggest use gstreamer or jetson_multimedia_api. You can directly access the buffers through NvBuffer APIs to have data in NvBuffer from head to tail, to eliminate the memory copy.

DmitryK · October 18, 2021, 11:31am

By ‘use jetson_multimedia_api’ did you meant NvEGLImageFromFd call with glEGLImageTargetTexture2DOES? If so, please clarify:

Do I have to call NvBufferTransform before NvEGLImageFromFd call, or it’s possible to bind v4l2 device’s capture buffer?
- If no call to NvBufferTransform needed, then would it really be much faster than copying frame to CPU buffer? I mean, samplerExternalOES in a framgment shader would probably do (implicitly) the same transformations as the NvBufferTransform. Considering NvBufferTransform is one of the most time consuming operations out there, isn’t it just going to increase draw call time respectively?

DaneLLL · October 19, 2021, 2:46am

Hi,
For video decoding, please refer to this sample:

 /usr/src/jetson_multimedia_api/samples/00_video_decode

All samples are in the folder. More information is in document:
Jetson Linux API Reference: Main Page

NvBufferTransform() is done via hardware converter so it does not take CPU usage and not have much latency.

DmitryK · October 20, 2021, 8:43pm

Hi!
Thanks for the answers!

Meanwhile I’ve build 00_video_decode sample with Easy Profiler (GitHub - yse/easy_profiler: Lightweight profiler library for c++) and have some pictures to share:

NvBufferTransform() costs 1.9ms average with spikes to 2.7ms
- ./jn_video_decode H264 --fullscreen -fps 250 jellyfish.h264
  
  00_video_decode_nv_buffer_transform1570×1080 149 KB
Average frame render time (draw calls + glSwapBuffers) is 3.1ms with frame binded to texture right from capture buffer (glEGLImageTargetTexture2DOES call) and without NvBufferTransform call
- ./jn_video_decode H264 --fullscreen -fps 250 --stats jellyfish.h264
  
  00_video_decode_direct_render1593×1063 148 KB

No conclusions, just a couple of tests in the hope it might be useful for someone.

system · November 10, 2021, 5:51am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Nano H264 decoder performance Jetson Nano decoder	6	3453	November 17, 2021
16 channel video decode speed too slow Jetson Xavier NX decoder	6	960	October 18, 2021
Gradualy increased memory usage when use gstreamer + opencv Jetson Nano opencv , gstreamer	26	3891	October 18, 2021
Why jetson gstreamer plugin nvvidconv not support BGR format output? Jetson Nano gstreamer	13	2072	December 30, 2021
Xaiver nvdec decode performance Jetson AGX Xavier decoder	9	2650	October 18, 2021
Hardware re-encode MJPG to H.264 Jetson Nano	20	2993	October 14, 2021
How to reduce cpu load for gstreamer encoding Jetson AGX Xavier gstreamer	16	2727	September 21, 2022
hight cpu usage problem with nvcompositor Jetson Nano	12	1526	October 18, 2021
GStreamer to convert from MPEG2 to MP4 Jetson Xavier NX gstreamer	14	4480	October 18, 2021
GStreamer nvvidconv performance/cost (UYVY to NV12 for nvv4l2h264enc) Jetson Xavier NX camera , gstreamer , nvbugs	13	1595	December 7, 2022

Decoder performance: gstreamer (nvv4l2decoder) vs ffmpeg (h264_nvv4l2dec)

Related topics