FFMPEG : With hardware accelerated H264 encode (nvenc_h264 ) Dead slow on Amazon G2 instance

Hello There,

FFMPEG when compiled with “–enable-nonfree --enable-gpl --enable-version3 --enable-shared --enable-pthreads –enable-nvenc --enable-runtime-cpudetect --disable-doc --enable-libmp3lame” options gives abysmal performance when executed on Amazon G2 instance( g2.2xlarge, GRID K520).

E.g. For 5.22 mins BVE_Localize.mp4 file took 66 seconds
time ffmpeg -y -i BVE_Localize.mp4 -strict -2 -vcodec nvenc_h264 -b 5000k -acodec aac -ab 256k -f mpegts BVELocalize.ts

real 1m6.990s, user 2m56.040s , sys 0m1.944s

Above command when executed on Dell-Precision-T1700 (Xeon Dual core, Quadro K620) based workstation takes

real 0m41.572s , user 1m53.621s , sys 0m1.434s

Please help improve performance of ffmpeg on Amazon G2 instance. What do you think I might be missing
I am using G2 (Ubuntu 14.04 64 bit, Cuda 7.0 , 352.55 drivers, MSI disabled)


The GRID K520 has two Kepler GPUs, most likely only one is used by ffmpeg. The Quadro K620 is a Maxwell (first generation) GPU. The encoding performance of the hardware encoder in a Maxwell GPU is simply higher than in a Kepler GPU, see also this application note, table 5 on page 10. The only way you can get more performance out of your G2 instance is by running multiple instances (e.g. 2) of ffmpeg simultaneously, each utilizing a different GPU.

GRID K520 does indeed have two Kepler GPUs, however an Amazon g2.2xlarge instance only provides one of these GPUs in the instance. There is a g2.8xlarge instance I believe which has multiple GPUs available in it.

Thank you Gert-Jan and Txbob for quick response.
I even tried ffmpeg build with steps mentioned in document below.

This has nvidia acceleration patch and cuda utils included, which has not resulted into any performance improvement on G2.

My current configuration is G2 2x Large{ Ubuntu 14.04, 64 bit, Kepler K520 , Cuda 7.0 , NVidia SDK 5.0.1 , driver version 352.55, MSI Disabled}

Will change to Cuda 7.5 improve performance? Anything else you suggest?
I will also try with G2.8xLarge and see if it helps

Did CUDA 7.5 fix this for you?