Difference in performace for parallell decode encode with ffmpeg h264_cuvid and h264_nvenc Tesla P100

haye.haehne · November 14, 2017, 9:26am

Hi,

I have made performance measures for decoding and encoding h264 video with up to 8 parallell threads on Tesla P100.
The test runs on Ubuntu 17.04 with driver nvidia 384.90, from ffmpeg with codecs h264_cuvid to decode and h264_nvenc to encode (average time per frame in milliseconds):

For threads

1 : decode 0.5 encode 0.4
2 : decode 0.9 encode 0.4
3 : decode 1.5 encode 0.4
4 : decode 2.0 encode 0.4
5 : decode 2.6 encode 0.4
6 : decode 3.2 encode 0.5
7 : decode 3.7 encode 0.6
8 : decode 4.3 encode 0.6

So the decoding time increases linear with the threads, the encoding time stays more or less stable.
For 9 threads, I get the error ‘No NVENC capable devices found’

nvidia-smi during the encoding with 8 threads:

Timestamp : Tue Nov 14 10:30:25 2017
Driver Version : 384.90

Attached GPUs : 1
GPU 00000000:00:08.0
FB Memory Usage
Total : 16276 MiB
Used : 5307 MiB
Free : 10969 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 2 MiB
Free : 16382 MiB
Compute Mode : Default
Utilization
Gpu : 47 %
Memory : 26 %
Encoder : 51 %
Decoder : 100 %
GPU Utilization Samples
Duration : 18446744073709.22 sec
Number of Samples : 99
Max : 51 %
Min : 0 %
Avg : 0 %
Memory Utilization Samples
Duration : 18446744073709.22 sec
Number of Samples : 99
Max : 0 %
Min : 0 %
Avg : 0 %
ENC Utilization Samples
Duration : 18446744073709.22 sec
Number of Samples : 99
Max : 51 %
Min : 0 %
Avg : 0 %
DEC Utilization Samples
Duration : 18446744073709.22 sec
Number of Samples : 99
Max : 99 %
Min : 0 %
Avg : 0 %

Question: Is the decoding performance a normal behaviour on Tesla P100 ?

Thanks and regards, Haye

Topic		Replies	Views
NVDEC/CUDA/NVENC speed comparison GPU-Accelerated Libraries	9	51491	May 30, 2019
Tesla K20 with nvenc GPU-Accelerated Libraries	0	1083	July 25, 2014
[Linux] NVCuvid - Performarce CUDA Programming and Performance	13	4142	March 9, 2016
ffmpeg failed at encoding on Tesla T4 card Video Processing & Optical Flow	2	2955	December 28, 2019
FFMPEG Transcoding Perfromance not good on Tesla p4 GPU-Accelerated Libraries	1	1975	July 27, 2019
FFMPEG : With hardware accelerated H264 encode (nvenc_h264 ) Dead slow on Amazon G2 instance CUDA Programming and Performance	0	1541	November 16, 2015
NVENC performance issue CUDA Programming and Performance	0	930	May 10, 2013
CUDA 5.0 (Decode video using NVCUVID) and Performance CUDA Programming and Performance	2	3581	November 8, 2012
NVENC capability on Pascal GPUs GPU-Accelerated Libraries	0	825	March 5, 2017
Session restrictions on Tesla P4, Quadro P6000 GPU-Accelerated Libraries	1	864	September 14, 2018

Difference in performace for parallell decode encode with ffmpeg h264_cuvid and h264_nvenc Tesla P100

Related topics