Hi!
What magic tricks or settings allow gstreamer’s nvv4l2decoder outperform ffmpeg’s h264_nvv4l2dec more than 2x in h264 1080p decoding?
The tests:
-
gst-launch-1.0 filesrc location= jellyfish-5-mbps-hd-h264.mkv ! matroskademux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 -v
- 260+ fps
-
ffmpeg -c:v h264_nvv4l2dec -i jellyfish-5-mbps-hd-h264.mkv -c:v rawvideo -f null -
- 120+ fps
-
ffmpeg -c:v h264 -threads 4 -i jellyfish-5-mbps-hd-h264.mkv -c:v rawvideo -f null -
- 110+ fps (software decoder!)
And some observations:
- 4 cortex-a57 cores able to achieve almost the same decoding speed as NVDEC DSP with an ffmpeg implementation
- two ffmpeg decoding processes may be started simultaneously and fps won’t drop: each will still be decoding 120fps
- this means there are many hardware resources remains available when only one ffmpeg process is running
- changing an output pixel format from YUV420 to NV12 increases a ffmpeg’s h264_nvv4l2dec performance a lot, but still far from a gstreamer:
nvv4l2dec_create_decoder(avctx, nv_codec_type, V4L2_PIX_FMT_NV12M /*V4L2_PIX_FMT_YUV420M*/);
- after that the ffmpeg’s decoding framerate jumps to 160+fps
- enabling V4L2_CID_MPEG_VIDEO_MAX_PERFORMANCE does nothing
ret = set_ext_controls(ctx->fd, V4L2_CID_MPEG_VIDEO_MAX_PERFORMANCE, 1);
- framerate is the same
- disabling options enable-max-performance in a gstreamer also doesn’t change it’s performance
- there is GitHub - jocover/jetson-ffmpeg: ffmpeg support on jetson nano implementation of a codec with the jetson multimedia API
- it uses NvVideoDecoder class from Video Decoder API
- performance slightly less than the h264_nvv4l2dec but near
- maximizing jetson nano performance with ‘nvpmodel’ and ‘jetson_clocks’ increases performance for all implementations but ffmpeg’s h264_nvv4l2dec still remains 2 times slower than gstreamer’s nvv4l2decoder
Right now I think the main culprit is NvBufferTransform. If removed, the framerate increases to similar as with a gstreamer. Naturally a mere disablement of NvBufferTransform completely discards output of decoder, therefore is useless by itself. But I’ve tried to change the code in nvv4l2_dec.c so it extracts data from a source buffers instead of a destination buffer. Like this:
NvBuffer2Raw(decoded_buffer->planes[0].fd, 0, parm.width[0], parm.height[0], ...
Strangely after that the framerate dropped dramatically to about 60fps. That I don’t understand.
Actual questions:
- Which nuances of the gstreamer implementation allows it to achieve 260fps in h264 decoding?
- What can be done to the ffmpeg h264_nvv4l2dec implementation to achieve performance similar to the gstreamer?
- Is it possible to drop NvBufferTransform in favor of retrieving frame data directly from source buffers?
- Why in your opinion my attempt to retrieve data directly from source buffer instead of NvBufferTransform has dropped performance?
Thank you in advance.