Details about NVENC in Turing?

It means that you only need new header files where all new structs and functions are redefined and then you can implement them to ffmpeg and recompile it, there are not new .h files yet published.

But from my tests i can’t imagine that there will be any significant quality improvment possible with new functions, i think they will be for special cases only, you can use new driver as it is with old ffmpeg and it will works like charm.

I did all my tests with ffmpeg 4.1 but there is no difference when you use ffmpeg 3.2 or any other tool.

No I don’t have acces to SDK 9 yet.

Latest driver also increased quality when using fast presets to almost same like hq for H264.

@Thunderm: Are new drivers offer any improvements/changes on Pascal hardware?

No, i didn’t see any improvment at all for GTX cards, we did alotof tests and actualy we sell all of our Quadro P5000 GPUs.

Another news:

  1. I found that with SDK 9.0 and drivers function -b_ref_mode middle is actualy on by default for HEVC, and new drivers supports also mode each, mode middle brings 0.2 PSNR, which is main reason why 418.30 are better than 415.27 when B-frames are used.

  2. It looks that 64x64 matrixes are still not supported yet, but actualy we don’t wan’t to use them, because they could degrade quality in many cases.

  3. It is very interesting that main performance impact of NVENC is when using 8x8 CU, or 16x16 CU, speed difference of NVENC is: 8x8 - 150 fps (slow/hw preset), 16x16 - 300 fps (default preset), 32x32 - 600 fps (fast preset) for 1080p content, so generaly i think this version of NVENC is not yet fine tuned and could increase performance in future for slow/hq/default to almost 600 fps (we hope!!!), that could eliminate need of second NVENC engine, just to mention Quadro P5000 with dual NVENC has total speed of 700 fps (350 per ENGINE at default preset).

Tips:

For realtime encoding 4kp60 content at best possible quality you could try this setting for ffmpeg (custom build)
-preset slow
-bf 4
-b_ref_mode middle
-rc_lookahead 32
-gop 600
-r 60
-min_cu_size 32x32

This will give you performance of 95 fps for 4k content with quality very near to Default preset which has performance of 69 fps and is not stable for long time run in realtime. Also when you use only 32x32 matrixes decoding is much simpler.

So Windows driver 418.81 was released yesterday, so a later version than the Linux beta with the SDK 9 improvements…but nothing in the release notes about SDK 9 or NVENC Turing support.

We are with ffmpeg going to encode thousands of MXF video files around 1 PB to H264 or H265 to save storage space. What is the best hardware when going for that RTX 2080 Ti? How important are the other HW like CPU & RAM?

By reading the thread I understand that version of the Nvidia driver is important and it gets faster with each release. Right?

What’s New in Version 418 U1

  • Video HVEC Support
    • Added H.265 B-frame support for NVIDIA Turing GPUs
    • Added H.265 444 Decode support for NVIDIA Turing GPUs

NVIDIA VIDEO CODEC SDK | NVIDIA Developer

Video_Codec_SDK_9.0.18 Release_notes.txt

What’s new in Video Codec SDK 9.0:

In NVIDIA Video Codec SDK release 9.0, following features have been added:

Encode Features::

  1. Improved encoded quality for Turing GPUs
  2. HEVC B-frame support (Turing GPUs only)
  3. Encoded output in video memory
  4. H.264 ME only mode output in video memory.
  5. Non-reference P frames
  6. Support for accepting CUArray as input

Decode Features::

  1. HEVC YUV 444 decoding (Turing GPUs only)
  2. Multiple NVDEC engines(Turing GPUs only)

In SDK9, HEVC can use “B-FRAMES AS REFERENCE”.
HEVC can use NV_ENC_BFRAME_REF_MODE_MIDDLE and NV_ENC_BFRAME_REF_MODE_EACH.
(H.264 can use NV_ENC_BFRAME_REF_MODE_MIDDLE only.)

This function maybe improves Turing’s HEVC encoding quality.

By “Table 1” of Video_Codec_SDK_9.0.18 NVENC_Application_Note.pdf,
Turing GPUs except TU117 does not support “H.264 field encoding”.

So using latest Zeranoe Windows ffmpeg build I see the following:

b_ref_mode each not supported for HEVC
b_ref_mode middle produces tons of warnings in the output about invalid dts and pts. I’ve tried with a few different sources

Also on my own ffmpeg-based code I am finding that avcodec_receive_packet is returning an empty/invalid packet, even though the function is returning a value of zero which means it should be ok.

Is anybody else seeing issues with Windows ffmpeg with b_ref_mode middle under SDK9 and latest Windows driver?

1 Like
1 Like

My opinion: TeslaT4 is unusable for VDI due to NVENC problem. Turing is unbalanced chip for VDI - NVidia add RT cores, boosted CUDA cores and memory but dropped one NVENC. There is comparison VDI usage with NVENC assisted stream encoding (H.264, Low latency High Performance single pass, reference NVENC speeds are taken from NVidia Video Codec SDK 9.0 (NVENC_Application_Note.pdf), GPU clocks from wikipedia):

(see also https://gridforums.nvidia.com/default/topic/8934/post/14482/#14482)

Has anyone tested NVENC/NVDEC on GTX 1660 and GTX 1660 Ti?

Almost inexplicable.
Somehow it feels as if the T4 is either a down-grade for NVENC with high density workloads OR NVIDIA needs to work on their device drivers.

So, still no interlaced encoding with a 1660 card?
I use it almost everyday, but now cannot upgrade my gpu because of this? :(

oviano, all the issues about Invalid DTS are closed as duplicate to this issue: https://trac.ffmpeg.org/ticket/7303
Any help will be appreciated.

So after reading this, do I use my Quadro RTX 4000 or my Quadro P5000 in my streaming media server? I own both cards and only stream H.264 out. It seems the P5000 might be my better choice due to its dual NVENC chips? Please correct me if I’m wrong.

It depends on content, requested output quality and generated bandwidth… so test it on your specific use case.

  • realtime transcoding (latency sensitive) of many streams -> P5000
  • quality and bandwidth are the keys -> RTX4000

Expected maximum performance (see https://developer.nvidia.com/video_codec_sdk/documentation/v9.1/NVENC_Application_Note.pdf “Table 4. NVENC encoding performance”, H.264, Low latency High Performance, Single Pass):

  • P5000 core clock 1607-1733(boost) -> 1733/1683*528*2 = 1087 1080pFPS = 36 1080p30 streams
  • RTX4000 core clock 1005-1545(boost) -> 1545/1755*695 = 611 1080pFPS = 20 1080p30 streams

https://developer.nvidia.com/nvidia-video-codec-sdk and click “Additional Performance Results” “T*” vs “P*” encoder(s) and read very careful x-axis comments (“marketing” selection):

So this is the part I don’t understand. As far as I know I don’t have control over these metrics. Maybe I do with the very simple “quality setting” of Plex Media Server? I always choose the highest quality setting which says “Make My CPU Hurt”.

But I’ve been having issues in general with hardware transcoding that I don’t really know what to do at this point. Hardware transcoding stalls or doesn’t work at all and I’ve had to fall back on software transcoding.