ffmpeg failed at encoding on Tesla T4 card

Hi

I’m trying to build ffmpeg with NVENC on the AWS g4dn instance which has a Tesla T4, the OS is Ubuntu 18.04. My build script here:

#!/bin/bash

mkdir -p /usr/local/src
cd /usr/local/src


# Install Dependencies
apt-get update -qq && sudo apt-get -y install autoconf automake build-essential cmake git-core libsdl2-dev libtool libva-dev libvdpau-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo wget zlib1g-dev libx264-dev libx265-dev libfdk-aac-dev yasm nasm openssl libssl-dev gcc make 

# Build Cuda Header
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers && sudo make install 
cd ../

# Install CUDA Toolkit

wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
chmod +x cuda_10.2.89_440.33.01_linux.run
sh cuda_10.2.89_440.33.01_linux.run
[code]

Path

export PATH=/usr/local/cuda-10.2/bin${PATH:+:{PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64{LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Build ffmpeg.

git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --extra-cflags="-I/usr/local/cuda/include" --extra-ldflags="-L/usr/local/cuda/lib64" --enable-gpl --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-nonfree --enable-openssl --enable-nvenc --enable-cuda-nvcc --enable-cuvid --enable-libnpp --enable-nvdec --enable-filter=scale_cuda --enable-filter=thumbnail_cuda --enable-filter=yadif_cuda --enable-libfdk_aac --disable-ffplay --bindir="/usr/local/bin"
make -j $(nproc)

[/code]

Everything is fine, but then when using ffmpeg to encode a sample H264 file, ffmpeg only encode at 0.1fps

ffmpeg -v verbose -hwaccel nvdec -hwaccel_output_format cuda -i den.mp4 -c:v h264_nvenc -b:v 2M -y output.mp4
ffmpeg version N-96097-g99f505d2df Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
  configuration: --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --enable-gpl --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-nonfree --enable-openssl --enable-nvenc --enable-cuda-nvcc --enable-cuvid --enable-libnpp --enable-nvdec --enable-filter=scale_cuda --enable-filter=thumbnail_cuda --enable-filter=yadif_cuda --enable-libfdk_aac --disable-ffplay --bindir=/usr/local/bin
  libavutil      56. 36.101 / 56. 36.101
  libavcodec     58. 65.100 / 58. 65.100
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.101 / 58.  9.101
  libavfilter     7. 69.101 /  7. 69.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[h264 @ 0x55fb298f78c0] Reinit context to 1920x816, pix_fmt: yuv420p
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'den.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    creation_time   : 2019-11-26T09:10:49.000000Z
  Duration: 00:04:56.88, start: 0.000000, bitrate: 1407 kb/s
    Stream #0:0(und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(tv, bt709, progressive, left), 1920x804 (1920x816) [SAR 1:1 DAR 160:67], 44 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2019-11-26T09:10:49.000000Z
      handler_name    : ISO Media file produced by Google Inc.
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
[h264 @ 0x55fb2a2310c0] NVDEC capabilities:
[h264 @ 0x55fb2a2310c0] format supported: yes, max_mb_count: 65536
[h264 @ 0x55fb2a2310c0] min_width: 48, max_width: 4096
[h264 @ 0x55fb2a2310c0] min_height: 16, max_height: 4096
[h264 @ 0x55fb2a2310c0] Reinit context to 1920x816, pix_fmt: cuda
frame=    0 fps=0.0 q=0.0 size=       0kB time=-577014:32:22.77 bitrate=  -0.0kbits/s speed=N/A  [graph 0 input from stream 0:0 @ 0x55fb2a5514c0] w:1920 h:804 pixfmt:cuda tb:1/12800 fr:25/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 0x55fb29920640] Loaded Nvenc version 9.1
[h264_nvenc @ 0x55fb29920640] Nvenc initialized successfully
Output #0, mp4, to 'output.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    encoder         : Lavf58.35.101
    Stream #0:0(und): Video: h264 (h264_nvenc) (Main), 1 reference frame (avc1 / 0x31637661), cuda(left), 1920x804 [SAR 1:1 DAR 160:67], q=-1--1, 2000 kb/s, 25 fps, 12800 tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2019-11-26T09:10:49.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      encoder         : Lavc58.65.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: N/A
frame=    3 fps=0.2 q=25.0 size=       0kB time=00:00:00.00 bitrate=4923.1kbits/s speed=6.28e-06xframe=    4 fps=0.2 q=25.0 size=       0kB time=00:00:00.04 bitrate=   9.6kbits/s speed=0.00174x frame=    5 fps=0.1 q=24.0 size=       0kB time=00:00:00.08 bitrate=   4.8kbits/s speed=0.00239x frame=    6 fps=0.1 q=23.0 size=       0kB time=00:00:00.12 bitrate=   3.2kbits/s speed=0.00272x frame=    7 fps=0.1 q=22.0 size=       0kB time=00:00:00.16 bitrate=   2.4kbits/s speed=0.00293x frame=    8 fps=0.1 q=21.0 size=       0kB time=00:00:00.20 bitrate=   1.9kbits/s speed=0.00306x frame=    9 fps=0.1 q=20.0 size=       0kB time=00:00:00.24 bitrate=   1.6kbits/s speed=0.00316x frame=   10 fps=0.1 q=19.0 size=       0kB time=00:00:00.28 bitrate=   1.4kbits/s speed=0.00324x frame=   11 fps=0.1 q=18.0 size=       0kB time=00:00:00.32 bitrate=   1.2kbits/s speed=0.0033x  frame=   12 fps=0.1 q=17.0 size=       0kB time=00:00:00.36 bitrate=   1.1kbits/s speed=0.00335x frame=   13 fps=0.1 q=16.0 size=       0kB time=00:00:00.40 bitrate=   1.0kbits/s speed=0.00339x frame=   14 fps=0.1 q=15.0 size=       0kB time=00:00:00.44 bitrate=   0.9kbits/s speed=0.00342x frame=   15 fps=0.1 q=14.0 size=       0kB time=00:00:00.48 bitrate=   0.8kbits/s speed=0.00345x frame=   16 fps=0.1 q=13.0 size=       0kB time=00:00:00.52 bitrate=   0.7kbits/s speed=0.00347x frame=   17 fps=0.1 q=12.0 size=       0kB time=00:00:00.56 bitrate=   0.7kbits/s speed=0.00349x frame=   18 fps=0.1 q=11.0 size=       0kB time=00:00:00.60 bitrate=   0.6kbits/s speed=0.00351x frame=   19 fps=0.1 q=11.0 size=       0kB time=00:00:00.64 bitrate=   0.6kbits/s speed=0.00352x frame=   20 fps=0.1 q=10.0 size=       0kB time=00:00:00.68 bitrate=   0.6kbits/s speed=0.00354x frame=   21 fps=0.1 q=10.0 size=       0kB time=00:00:00.72 bitrate=   0.5kbits/s speed=0.00355x frame=   22 fps=0.1 q=10.0 size=       0kB time=00:00:00.76 bitrate=   0.5kbits/s speed=0.00356x frame=   23 fps=0.1 q=10.0 size=       0kB time=00:00:00.80 bitrate=   0.5kbits/s speed=0.00357x

On other instances which have Tesla M60 or Tesla V100 it works well though.

Just found out, it only happen if I use -hwaccel nvdec -hwaccel_output_format cuda or -hwaccel cuvid -c:v h264_cuvid before the input . If I use software decode, it works fine . However it detroys the purpose as I want the entire encode process in GPU pipline .

I’m trying to compile with 4.2 and 4.1 branch as well but it still has the same issue. The OS is ubuntu 16.04

With CUVID

ffmpeg -v verbose -hwaccel cuvid -c:v h264_cuvid -i input.mp4 -c:v h264_nvenc -b:v 3M -y output.mp4
ffmpeg version n4.1.4-22-g08d3cc2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --enable-gpl --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-nonfree --enable-openssl --enable-nvenc --enable-cuda-sdk --enable-cuvid --enable-libnpp --enable-nvdec --enable-filter=scale_cuda --enable-filter=thumbnail_cuda --enable-filter=yadif_cuda --enable-libfdk_aac --disable-ffplay --bindir=/usr/local/bin
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
[h264 @ 0x3ded340] Reinit context to 1920x1088, pix_fmt: yuv420p
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    creation_time   : 2019-12-19T06:29:54.000000Z
  Duration: 00:04:56.21, start: 0.000000, bitrate: 3834 kb/s
    Stream #0:0(und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(tv, bt709, progressive, left), 1920x1080 (1920x1088) [SAR 1:1 DAR 16:9], 29 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      creation_time   : 2019-12-19T06:29:54.000000Z
      handler_name    : ISO Media file produced by Google Inc.
[h264_cuvid @ 0x3e13880] Initializing cuvid hwaccel
[h264_cuvid @ 0x3e13880] CUVID capabilities for h264_cuvid:
[h264_cuvid @ 0x3e13880] 8 bit: supported: 1, min_width: 48, max_width: 4096, min_height: 16, max_height: 4096
[h264_cuvid @ 0x3e13880] 10 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
[h264_cuvid @ 0x3e13880] 12 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
[h264_cuvid @ 0x3e13880] Initializing cuvid hwaccel
[h264_cuvid @ 0x3e13880] Formats: Original: cuda | HW: cuda | SW: nv12
frame=    0 fps=0.0 q=0.0 size=       0kB time=-577014:32:22.77 bitrate=  -0.0kbits/s speed=N/A  [graph 0 input from stream 0:0 @ 0x4ada6c0] w:1920 h:1080 pixfmt:cuda tb:1/24000 fr:24000/1001 sar:1/1 sws_param:flags=2
[h264_nvenc @ 0x3e13280] Loaded Nvenc version 9.1
[h264_nvenc @ 0x3e13280] Nvenc initialized successfully
Output #0, mp4, to 'output.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    encoder         : Lavf58.20.100
    Stream #0:0(und): Video: h264 (h264_nvenc) (Main), 1 reference frame (avc1 / 0x31637661), cuda(left), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 3000 kb/s, 23.98 fps, 24k tbn, 23.98 tbc (default)
    Metadata:
      creation_time   : 2019-12-19T06:29:54.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      encoder         : Lavc58.35.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/3000000 buffer size: 6000000 vbv_delay: -1
frame=    3 fps=0.1 q=24.0 size=       0kB time=00:00:00.00 bitrate=9142.9kbits/s speed=1.9e-06x frame=    4 fps=0.1 q=33.0 size=       0kB time=00:00:00.04 bitrate=   9.2kbits/s speed=0.000965xframe=    5 fps=0.1 q=31.0 size=       0kB time=00:00:00.08 bitrate=   4.6kbits/s speed=0.00155x frame=    6 fps=0.1 q=30.0 size=       0kB time=00:00:00.12 bitrate=   3.1kbits/s speed=0.00167x frame=    7 fps=0.1 q=30.0 size=       0kB time=00:00:00.16 bitrate=   2.3kbits/s speed=0.00195x frame=    8 fps=0.1 q=29.0 size=       0kB time=00:00:00.20 bitrate=   1.8kbits/s speed=0.00217x

With Software Decode

ffmpeg -v verbose -i input.mp4 -c:v h264_nvenc -b:v 3M -y output.mp4
ffmpeg version n4.1.4-22-g08d3cc2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --enable-gpl --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-nonfree --enable-openssl --enable-nvenc --enable-cuda-sdk --enable-cuvid --enable-libnpp --enable-nvdec --enable-filter=scale_cuda --enable-filter=thumbnail_cuda --enable-filter=yadif_cuda --enable-libfdk_aac --disable-ffplay --bindir=/usr/local/bin
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
[h264 @ 0x406c240] Reinit context to 1920x1088, pix_fmt: yuv420p
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    creation_time   : 2019-12-19T06:29:54.000000Z
  Duration: 00:04:56.21, start: 0.000000, bitrate: 3834 kb/s
    Stream #0:0(und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(tv, bt709, progressive, left), 1920x1080 (1920x1088) [SAR 1:1 DAR 16:9], 29 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      creation_time   : 2019-12-19T06:29:54.000000Z
      handler_name    : ISO Media file produced by Google Inc.
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
[h264 @ 0x4093c00] Reinit context to 1920x1088, pix_fmt: yuv420p
[graph 0 input from stream 0:0 @ 0x44e5440] w:1920 h:1080 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:1/1 sws_param:flags=2
[h264_nvenc @ 0x4091d00] Loaded Nvenc version 9.1
[h264_nvenc @ 0x4091d00] Nvenc initialized successfully
[h264_nvenc @ 0x4091d00] 1 CUDA capable devices found
[h264_nvenc @ 0x4091d00] [ GPU #0 - < Tesla T4 > has Compute SM 7.5 ]
[h264_nvenc @ 0x4091d00] supports NVENC
Output #0, mp4, to 'output.mp4':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6avc1mp41
    encoder         : Lavf58.20.100
    Stream #0:0(und): Video: h264 (h264_nvenc) (Main), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 3000 kb/s, 23.98 fps, 24k tbn, 23.98 tbc (default)
    Metadata:
      creation_time   : 2019-12-19T06:29:54.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      encoder         : Lavc58.35.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/3000000 buffer size: 6000000 vbv_delay: -1
No more output streams to write to, finishing.e=00:04:52.37 bitrate=3034.1kbits/s speed=11.6x    
frame= 7102 fps=278 q=22.0 Lsize=  109408kB time=00:04:56.17 bitrate=3026.2kbits/s speed=11.6x    
video:109378kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.027704%
Input file #0 (input.mp4):
  Input stream #0:0 (video): 7102 packets read (141882400 bytes); 7102 frames decoded; 
  Total: 7102 packets (141882400 bytes) demuxed
Output file #0 (output.mp4):
  Output stream #0:0 (video): 7102 frames encoded; 7102 packets muxed (112002911 bytes); 
  Total: 7102 packets (112002911 bytes) muxed
[AVIOContext @ 0x4092180] Statistics: 2 seeks, 431 writeouts
[h264_nvenc @ 0x4091d00] Nvenc unloaded
[AVIOContext @ 0x40734c0] Statistics: 141974669 bytes read, 0 seeks

I’ve been able to replicate really poor performance same as the above. This is the command that I run.

ffmpeg -v verbose -y -hwaccel cuda -c:v vp8_cuvid -vsync 0 -i INPUT.webm -c:v h264_nvenc -preset medium -profile:v high -maxrate 10M -qmin 0 -g 250 -bf 2 -i_qfactor 0.75 -b_qfactor 1.1 OUTPUT.mp4

I think the problem might be with the h264 encoder since I am using a different decoder.

My input video is around 480p

Here is my output if it helps someone debug.

fmpeg version git-2019-12-28-2736dc0 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
  configuration: --enable-cuda-nvcc --enable-cuvid --enable-nvenc --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --enable-libvpx
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 65.100 / 58. 65.100
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.101 / 58.  9.101
  libavfilter     7. 69.101 /  7. 69.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
Input #0, matroska,webm, from './INPUT.webm':
  Metadata:
    encoder         : Chrome
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)
    Stream #0:1(eng): Video: vp8, 1 reference frame, yuv420p(progressive), 640x480, SAR 1:1 DAR 4:3, 1k tbr, 1k tbn, 1k tbc (default)
    Metadata:
      alpha_mode      : 1
[vp8_cuvid @ 0x560ae3255ac0] CUVID capabilities for vp8_cuvid:
[vp8_cuvid @ 0x560ae3255ac0] 8 bit: supported: 1, min_width: 48, max_width: 4096, min_height: 16, max_height: 4096
[vp8_cuvid @ 0x560ae3255ac0] 10 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
[vp8_cuvid @ 0x560ae3255ac0] 12 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
Stream mapping:
  Stream #0:1 -> #0:0 (vp8 (vp8_cuvid) -> h264 (h264_nvenc))
  Stream #0:0 -> #0:1 (opus (native) -> aac (native))
Press [q] to stop, [?] for help
[vp8_cuvid @ 0x560ae3255ac0] Formats: Original: cuda | HW: cuda | SW: nv12
[vp8_cuvid @ 0x560ae3255ac0] ignoring invalid SAR: 0/0
[graph_1_in_0_0 @ 0x560ae387e380] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x4ed=N/A
[graph 0 input from stream 0:1 @ 0x560ae3870080] w:640 h:480 pixfmt:nv12 tb:1/1000 fr:1000/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 0x560ae32544c0] Loaded Nvenc version 9.1
[h264_nvenc @ 0x560ae32544c0] Nvenc initialized successfully
[h264_nvenc @ 0x560ae32544c0] 1 CUDA capable devices found
[h264_nvenc @ 0x560ae32544c0] [ GPU #0 - < Tesla T4 > has Compute SM 7.5 ]
[h264_nvenc @ 0x560ae32544c0] supports NVENC
Output #0, mp4, to 'out.mp4':
  Metadata:
    encoder         : Lavf58.35.101
    Stream #0:0(eng): Video: h264 (h264_nvenc) (High), 1 reference frame (avc1 / 0x31637661), nv12, 640x480 [SAR 1:1 DAR 4:3], q=0--1, 2000 kb/s, 1k fps, 16k tbn, 1k tbc (default)
    Metadata:
      alpha_mode      : 1
      encoder         : Lavc58.65.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 10000000/0/2000000 buffer size: 4000000 vbv_delay: N/A
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, delay 1024, 69 kb/s (default)
    Metadata:
      encoder         : Lavc58.65.100 aac
frame=    4 fps=0.1 q=0.0 size=       0kB time=00:00:00.25 bitrate=   1.5kbits/s speed=0.00241x