GPU transcoding performance comparision using FFmpeg 4 and 5 branches


Recently I’ve created some test runs for transcoding given MP4 files using FFmpeg with GPU acceleration.

I’ve noticed a drop in performance testing the recent FFmpeg 5.1 release with a simple full hardware transcode using h264_nvenc in comparison with FFmpeg 4.3/4.4.

Are there any comparable issues? Any hints to diagnose this?

Are benchmark results for GPU encoding/transcoding to compare?


Old FFmpeg v4.4.4 version is called with parameter:

ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -i input.mp4 -c:v h264_nvenc -b:v 2M output.mp4

New FFmpeg v5.1.3 version with is called parameter:

ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda -extra_hw_frames 8 -i input.mp4 -c:v h264_nvenc -b:v 2M output.mp4

Added time ffmpeg -y -benchmark ... to measure the difference.


For FFmpeg 4.4.4:

frame=32782 fps=344 q=19.0 Lsize=  343834kB time=00:21:51.32 bitrate=2148.0kbits/s speed=13.8x
video:322022kB audio:20521kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.377177%
bench: utime=30.184s stime=21.803s rtime=95.442s
bench: maxrss=135840kB
[aac @ 0x5601c60686c0] Qavg: 222.746

real    1m35.543s
user    0m30.184s
sys     0m21.845s

For FFmpeg 5.1.2:

frame=32782 fps=241 q=19.0 Lsize=  343956kB time=00:21:51.34 bitrate=2148.7kbits/s speed=9.63x
video:322022kB audio:20643kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.377043%
bench: utime=66.753s stime=27.008s rtime=136.567s
bench: maxrss=161032kB
[aac @ 0x55d7f93bd3c0] Qavg: 579.972

real    2m16.669s
user    1m6.777s
sys     0m27.032s


The input.mp4 was a video container a video track with ~ 21 min, H.264 (High@L3.1), 720p, 25 fps, 892 kbps bitrate, yuv420p and audio: AAC, 128 kbps.

Ffmpeg builds created in a docker container using nvidia/cuda:12.0.1-devel-ubuntu22.04 and nvidia/cuda:12.0.1-base-ubuntu22.04 with libnpp-12-0 installed as runtime image.

With headers from this tag:

FFMPEG was build with following configuration:

RUN cd /tmp/ffmpeg-${FFMPEG_VERSION} && \
    ./configure \
    --prefix=${PREFIX} \
    --disable-debug \
    --disable-doc \
    --disable-ffplay \    
    --enable-version3 \
    --enable-gpl \
    --enable-nonfree \
    --enable-small \
    --enable-libfdk-aac \
    --enable-openssl \
    --enable-cuda \
    --enable-cuvid \
    --enable-nvenc \    
    --enable-libnpp \
    --enable-nvenc \    
    --enable-shared \
    --extra-cflags="-I${PREFIX}/include -I${PREFIX}/include/ffnvcodec -I/usr/local/cuda/include/" \
    --extra-ldflags="-L${PREFIX}/lib -L/usr/local/cuda/lib64/" \
    --extra-libs=-ldl  && \
    make && \
    make install && \
    make distclean && \
    hash -r

Hardware: NVIDIA GeForce RTX 2080, Driver Version: 536.23, Docker 4.24.1 on Windows 10.

Thanks in advance for your comments and feedback on this issue.

Where is -hwaccel_output_format cuda -i .\test.mp4

It will be slow otherwise.