Details about NVENC in Turing?

NVIDIA VIDEO CODEC SDK | NVIDIA Developer

Video_Codec_SDK_9.0.18 Release_notes.txt

What’s new in Video Codec SDK 9.0:

In NVIDIA Video Codec SDK release 9.0, following features have been added:

Encode Features::

  1. Improved encoded quality for Turing GPUs
  2. HEVC B-frame support (Turing GPUs only)
  3. Encoded output in video memory
  4. H.264 ME only mode output in video memory.
  5. Non-reference P frames
  6. Support for accepting CUArray as input

Decode Features::

  1. HEVC YUV 444 decoding (Turing GPUs only)
  2. Multiple NVDEC engines(Turing GPUs only)

In SDK9, HEVC can use “B-FRAMES AS REFERENCE”.
HEVC can use NV_ENC_BFRAME_REF_MODE_MIDDLE and NV_ENC_BFRAME_REF_MODE_EACH.
(H.264 can use NV_ENC_BFRAME_REF_MODE_MIDDLE only.)

This function maybe improves Turing’s HEVC encoding quality.

By “Table 1” of Video_Codec_SDK_9.0.18 NVENC_Application_Note.pdf,
Turing GPUs except TU117 does not support “H.264 field encoding”.

So using latest Zeranoe Windows ffmpeg build I see the following:

b_ref_mode each not supported for HEVC
b_ref_mode middle produces tons of warnings in the output about invalid dts and pts. I’ve tried with a few different sources

Also on my own ffmpeg-based code I am finding that avcodec_receive_packet is returning an empty/invalid packet, even though the function is returning a value of zero which means it should be ok.

Is anybody else seeing issues with Windows ffmpeg with b_ref_mode middle under SDK9 and latest Windows driver?

1 Like
1 Like

My opinion: TeslaT4 is unusable for VDI due to NVENC problem. Turing is unbalanced chip for VDI - NVidia add RT cores, boosted CUDA cores and memory but dropped one NVENC. There is comparison VDI usage with NVENC assisted stream encoding (H.264, Low latency High Performance single pass, reference NVENC speeds are taken from NVidia Video Codec SDK 9.0 (NVENC_Application_Note.pdf), GPU clocks from wikipedia):

(see also https://gridforums.nvidia.com/default/topic/8934/post/14482/#14482)

Has anyone tested NVENC/NVDEC on GTX 1660 and GTX 1660 Ti?

Almost inexplicable.
Somehow it feels as if the T4 is either a down-grade for NVENC with high density workloads OR NVIDIA needs to work on their device drivers.

So, still no interlaced encoding with a 1660 card?
I use it almost everyday, but now cannot upgrade my gpu because of this? :(

oviano, all the issues about Invalid DTS are closed as duplicate to this issue: https://trac.ffmpeg.org/ticket/7303
Any help will be appreciated.

So after reading this, do I use my Quadro RTX 4000 or my Quadro P5000 in my streaming media server? I own both cards and only stream H.264 out. It seems the P5000 might be my better choice due to its dual NVENC chips? Please correct me if I’m wrong.

It depends on content, requested output quality and generated bandwidth… so test it on your specific use case.

  • realtime transcoding (latency sensitive) of many streams -> P5000
  • quality and bandwidth are the keys -> RTX4000

Expected maximum performance (see https://developer.nvidia.com/video_codec_sdk/documentation/v9.1/NVENC_Application_Note.pdf “Table 4. NVENC encoding performance”, H.264, Low latency High Performance, Single Pass):

  • P5000 core clock 1607-1733(boost) -> 1733/1683*528*2 = 1087 1080pFPS = 36 1080p30 streams
  • RTX4000 core clock 1005-1545(boost) -> 1545/1755*695 = 611 1080pFPS = 20 1080p30 streams

https://developer.nvidia.com/nvidia-video-codec-sdk and click “Additional Performance Results” “T*” vs “P*” encoder(s) and read very careful x-axis comments (“marketing” selection):

So this is the part I don’t understand. As far as I know I don’t have control over these metrics. Maybe I do with the very simple “quality setting” of Plex Media Server? I always choose the highest quality setting which says “Make My CPU Hurt”.

But I’ve been having issues in general with hardware transcoding that I don’t really know what to do at this point. Hardware transcoding stalls or doesn’t work at all and I’ve had to fall back on software transcoding.

The verified way is using FFMPEG or LIBAV from command line (ffmpeg or avconv) or over library API (or direct API NVIDIA video codec SDK). Please check documentation https://developer.nvidia.com/designworks/dl/Using_FFmpeg_with_NVIDIA_GPU_Hardware_Acceleration-pdf, examples https://developer.nvidia.com/ffmpeg, blogs … “Plex Media Server” should use FFMPEG with this features.

Yes, same issue with latest Windows ffmpeg from March 11, 2020 (ffmpeg-20200311-36aaee2-win64-static), under Windows 7 x64, GeForce Game Ready Driver 442.59 from March 10, 2020, GIGABYTE GeForce RTX 2080 SUPER WINDFORCE OC 8G, model # GV-N208SWF3OC-8GD.

I see that “b_ref_mode each” is not supported by the GPU, and that “b_ref_mode middle” produces tons of warnings in the output about invalid DTS and PTS.

My command line is:

ffmpeg -v verbose -threads auto -probesize 9000000000 -analyzeduration 9000000000 -hwaccel cuda -hwaccel_output_format cuda -f concat -safe 0 -i ffmpeginputs.txt -filter_complex “concat=n=1:v=1:a=1 [vout] [aout]” -map “[vout]” -map “[aout]” -vcodec hevc_nvenc -bf 4 -temporal_aq 1 -rc-lookahead 20 -g 250 -vsync 0 -b_ref_mode 2 -dpb_size 3 “%FOLDERNAME%.mp4”

Not sure if the “-hwaccel_output_format cuda” is needed, but same DTS/PTS errors whether I have that or not. I haven’t found a combination of -bf and -dpb_size that works. Anyone have any suggestions? I do see that it’s an open bug at https://trac.ffmpeg.org/ticket/7303

The SDK include nvEncodeAPI.h in v9.2 seems to indicate that B-frame each is only not supported on H264. Now I guess it’ll depend on the card as well, but it’s not quite true to say Turing supports B-Frames if you can’t use it all. It’s like half an implementation. Is that something they can enhance?

/**

  • B-frame used as reference modes
    */
    typedef enum _NV_ENC_BFRAME_REF_MODE
    {
    NV_ENC_BFRAME_REF_MODE_DISABLED = 0x0, /< B frame is not used for reference */
    NV_ENC_BFRAME_REF_MODE_EACH = 0x1, /
    < Each B-frame will be used for reference. currently not supported for H.264 */
    NV_ENC_BFRAME_REF_MODE_MIDDLE = 0x2, /**< Only(Number of B-frame)/2 th B-frame will be used for reference */
    } NV_ENC_BFRAME_REF_MODE;

Invalid DTS issue with b_ref_mode middle was fixed in https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=aaadf0dce8fa7b3b5073089498a84e758ceb975a

Everybody please go here and spam )))) You will say thank you to ffmpeg devs (me and others) by that. https://github.com/obsproject/obs-studio/issues/2374

So using latest Zeranoe Windows ffmpeg build I see the following:

b_ref_mode each not supported for HEVC

I see that “b_ref_mode each” is not supported by the GPU

#8809 (hevc_nvenc / b_ref_mode each is not working) – FFmpeg
https://trac.ffmpeg.org/ticket/8809

Can a Nvidia dev clarify whether HEVC is undeed supported “each” (it looks like it with ragaya nice code https://github.com/rigaya/NVEnc/blob/master/GPUFeatures/rtx2070.txt and my 2080 Ti). Also we need to know whether ffmpeg even has all done for “each”. Well, at least middle is fixed.

#8809 is IMHO fixed in https://trac.ffmpeg.org/ticket/8809#comment:3

P.S. tested it, it is fixed.