GPU performances using ffmpeg

Hello,

This is my first post on this forun. I hope I’m posting at the right place !
What I try to do is to perform some hevc hardware based transcoding using 1 nvidia V100 PCIE GPU.
While transcoding a short movie, I see :
- ffmpeg transcoding speed at 0.5
- GPU ENC = 5%
- GPU DEC = 30%
- TOP cpu shows 102% utilization but total cpu resources are 800% as I have 8 cores available

My concern is about performances.
I’m currently using nvidia-smi/nvtop to monitor my GPU + nmon/top to monitor my system metrics.
I can’t identify what is the performances bottleneck here ?
Is there some specific tools to deeper monitor the GPU (RXTX speed congestion) ?
And also how works the V100 GPU internally to perform such hw processing tasks ? (1 hw chip for nvenc + 1 hw chip for nvdec not tied to whole GPU itself ?)

You’ll see more technical informations attached.

Maybe someone have an idea how to progress on my side to go further.
Wish you a pleasant day

Hello Julien,
Can you provide us a reproducer, command line
regards

Hello,

Thanks for your feedback.

Quick scenario to reproduce the problem is to :

  • get an hevc 8K mp4 movie as the input (uzon-hevc.mp4)

  • transcode it from a VM with V100 pcie GPU on flexible engine p2s.2xlarge.8 (more infos in the previous attached picture):
    [root@ecs-gpu test]# date && cd /data/www/test1 && /home/local/ffmpeg_build/bin/ffmpeg -hwaccel cuda -i /testjulien/uzon-hevc.mp4 -maxrate 16M -bufsize 200M -b:v 15000k -c:v hevc_nvenc -s:v 3840x1920 -c:a aac -b:a 64k -ac 2 -f hls -hls_time 4 -g 25 -sc_threshold 0 -hls_flags independent_segments -hls_list_size 5 -strftime 1 -hls_segment_filename /data/www/test2/file_%m-%d_%H-%M-%S.m4s master.m3u8 && date

  • looking at the perfomances during the transcoding process. You should see the GPU is not used at 100%. Something is limiting the transcoding speed but I don’t know what …

We can discuss together internally through Christophe stream.
Maybe you need more details and I can’t display some informations and share contents right here.

BR
Julien

Hi Julien,

You should use this command line to better utilise the GPU

$ffmpeg -hwaccel cuvid -c:v h264_cuvid -vsync 0 -y -i $input -maxrate 64M -bufsize 400M -b:v 15000k -c:v hevc_nvenc -vf scale_npp=3840:1920 -c:a aac -b:a 64k -ac 2 -f hls -hls_time 4 -g 25 -sc_threshold 0 -hls_flags independent_segments -hls_list_size 5 -strftime 1 -hls_segment_filename file_%m-%d_%H-%M-%S.m4s.m4s master.m3u8

make sure to configure ffmpeg compilation with the following
./configure --enable-nonfree --enable-cuda-nvcc --enable-libnpp --enable-nvenc --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64

we use the npp scale filter to reduce CPU / GPU traffic.

To monitor the usage of GPU I recommend to use the following command:
nvidia-smi dmon -i 0 -s tu

Regards

Hi,

I have compiled ffmpeg again with support of “–enable-cuda-nvcc” + support of hevc inside rtmp (flv.c, flvenc.c, flvdec.c)

ffmpeg version 4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-39)
configuration: --prefix=/home/local/ffmpeg_build --pkg-config-flags=–static --extra-cflags=‘-I /home/local/ffmpeg_build/include -I/usr/local/cuda/include’ --extra-ldflags=‘-L /home/local/ffmpeg_build/lib -L/usr/local/cuda/lib64’ --extra-libs=-lpthread --extra-libs=-lm --bindir=/home/local/ffmpeg_build/bin --enable-gpl --enable-libfdk_aac --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-cuda --enable-cuvid --enable-nvenc --enable-libnpp --enable-nvdec --enable-nonfree --enable-cuda-nvcc

Thanks to your support, we were able to transcode 8K hevc 30fps 60Mbps.
Our fiber access + camera cannot stream higher bitrate. So limitation isn’t the transcoding part anymore.

Thanks a lot
Julien

Hi Julien,

If you don’t mind me asking, what was the rendering fps speed you were able to achieve on the v100? I’m not sure if you are listing that or the actual fps of the video in your last comment.

Hello,

src=8K hevc_nvdec 30fps 83Mbps RTMP → dest=6K hevc_nvenc 30fps 35Mbps HLS was OK
Some few peaks at 100% nvdec chips utilization.
We didn’t test higher quality but a margin still exists.
Depends on your customer device + player ability too