Details about NVENC in Turing?

Thunderm · February 2, 2019, 2:20pm

It means that you only need new header files where all new structs and functions are redefined and then you can implement them to ffmpeg and recompile it, there are not new .h files yet published.

But from my tests i can’t imagine that there will be any significant quality improvment possible with new functions, i think they will be for special cases only, you can use new driver as it is with old ffmpeg and it will works like charm.

I did all my tests with ffmpeg 4.1 but there is no difference when you use ffmpeg 3.2 or any other tool.

No I don’t have acces to SDK 9 yet.

Thunderm · February 2, 2019, 4:28pm

Latest driver also increased quality when using fast presets to almost same like hq for H264.

malakudi · February 5, 2019, 6:02pm

@Thunderm: Are new drivers offer any improvements/changes on Pascal hardware?

Thunderm · February 5, 2019, 7:56pm

No, i didn’t see any improvment at all for GTX cards, we did alotof tests and actualy we sell all of our Quadro P5000 GPUs.

Another news:

I found that with SDK 9.0 and drivers function -b_ref_mode middle is actualy on by default for HEVC, and new drivers supports also mode each, mode middle brings 0.2 PSNR, which is main reason why 418.30 are better than 415.27 when B-frames are used.
It looks that 64x64 matrixes are still not supported yet, but actualy we don’t wan’t to use them, because they could degrade quality in many cases.
It is very interesting that main performance impact of NVENC is when using 8x8 CU, or 16x16 CU, speed difference of NVENC is: 8x8 - 150 fps (slow/hw preset), 16x16 - 300 fps (default preset), 32x32 - 600 fps (fast preset) for 1080p content, so generaly i think this version of NVENC is not yet fine tuned and could increase performance in future for slow/hq/default to almost 600 fps (we hope!!!), that could eliminate need of second NVENC engine, just to mention Quadro P5000 with dual NVENC has total speed of 700 fps (350 per ENGINE at default preset).

Tips:

For realtime encoding 4kp60 content at best possible quality you could try this setting for ffmpeg (custom build)
-preset slow
-bf 4
-b_ref_mode middle
-rc_lookahead 32
-gop 600
-r 60
-min_cu_size 32x32

This will give you performance of 95 fps for 4k content with quality very near to Default preset which has performance of 69 fps and is not stable for long time run in realtime. Also when you use only 32x32 matrixes decoding is much simpler.

oviano · February 5, 2019, 8:43pm

So Windows driver 418.81 was released yesterday, so a later version than the Linux beta with the SDK 9 improvements…but nothing in the release notes about SDK 9 or NVENC Turing support.

Christian75 · February 8, 2019, 4:55pm

We are with ffmpeg going to encode thousands of MXF video files around 1 PB to H264 or H265 to save storage space. What is the best hardware when going for that RTX 2080 Ti? How important are the other HW like CPU & RAM?

By reading the thread I understand that version of the Nvidia driver is important and it gets faster with each release. Right?

EwoutH · February 11, 2019, 8:34am

What’s New in Version 418 U1

Video HVEC Support
- Added H.265 B-frame support for NVIDIA Turing GPUs
- Added H.265 444 Decode support for NVIDIA Turing GPUs

enctac · February 12, 2019, 8:30am

NVIDIA VIDEO CODEC SDK | NVIDIA Developer

Video_Codec_SDK_9.0.18 Release_notes.txt

What’s new in Video Codec SDK 9.0:

In NVIDIA Video Codec SDK release 9.0, following features have been added:

Encode Features::

Improved encoded quality for Turing GPUs
HEVC B-frame support (Turing GPUs only)
Encoded output in video memory
H.264 ME only mode output in video memory.
Non-reference P frames
Support for accepting CUArray as input

Decode Features::

HEVC YUV 444 decoding (Turing GPUs only)
Multiple NVDEC engines(Turing GPUs only)

enctac · February 12, 2019, 8:40am

In SDK9, HEVC can use “B-FRAMES AS REFERENCE”.
HEVC can use NV_ENC_BFRAME_REF_MODE_MIDDLE and NV_ENC_BFRAME_REF_MODE_EACH.
(H.264 can use NV_ENC_BFRAME_REF_MODE_MIDDLE only.)

This function maybe improves Turing’s HEVC encoding quality.

enctac · February 12, 2019, 9:20am

By “Table 1” of Video_Codec_SDK_9.0.18 NVENC_Application_Note.pdf,
Turing GPUs except TU117 does not support “H.264 field encoding”.

oviano · February 16, 2019, 7:05am

So using latest Zeranoe Windows ffmpeg build I see the following:

b_ref_mode each not supported for HEVC
b_ref_mode middle produces tons of warnings in the output about invalid dts and pts. I’ve tried with a few different sources

Also on my own ffmpeg-based code I am finding that avcodec_receive_packet is returning an empty/invalid packet, even though the function is returning a value of zero which means it should be ok.

Is anybody else seeing issues with Windows ffmpeg with b_ref_mode middle under SDK9 and latest Windows driver?

oviano · February 16, 2019, 7:09am

anon56509511 · March 28, 2019, 1:42pm

My opinion: TeslaT4 is unusable for VDI due to NVENC problem. Turing is unbalanced chip for VDI - NVidia add RT cores, boosted CUDA cores and memory but dropped one NVENC. There is comparison VDI usage with NVENC assisted stream encoding (H.264, Low latency High Performance single pass, reference NVENC speeds are taken from NVidia Video Codec SDK 9.0 (NVENC_Application_Note.pdf), GPU clocks from wikipedia):

(see also https://gridforums.nvidia.com/default/topic/8934/post/14482/#14482)

malakudi · April 15, 2019, 3:54pm

Has anyone tested NVENC/NVDEC on GTX 1660 and GTX 1660 Ti?

brainiarc7 · August 9, 2019, 10:56pm

Almost inexplicable.
Somehow it feels as if the T4 is either a down-grade for NVENC with high density workloads OR NVIDIA needs to work on their device drivers.

mrtt8m3k · August 31, 2019, 11:14am

So, still no interlaced encoding with a 1660 card?
I use it almost everyday, but now cannot upgrade my gpu because of this? :(

val.zapod.vz · November 6, 2019, 4:50am

oviano, all the issues about Invalid DTS are closed as duplicate to this issue: #7303 (h264_nvenc (and hevc_nvenc) with b_ref_mode middle creates invalid video while streaming) – FFmpeg
Any help will be appreciated.

kd6icz · January 4, 2020, 12:08am

So after reading this, do I use my Quadro RTX 4000 or my Quadro P5000 in my streaming media server? I own both cards and only stream H.264 out. It seems the P5000 might be my better choice due to its dual NVENC chips? Please correct me if I’m wrong.

anon56509511 · January 4, 2020, 9:26am

It depends on content, requested output quality and generated bandwidth… so test it on your specific use case.

realtime transcoding (latency sensitive) of many streams -> P5000
quality and bandwidth are the keys -> RTX4000

Expected maximum performance (see https://developer.nvidia.com/video_codec_sdk/documentation/v9.1/NVENC_Application_Note.pdf “Table 4. NVENC encoding performance”, H.264, Low latency High Performance, Single Pass):

P5000 core clock 1607-1733(boost) -> 1733/1683*528*2 = 1087 1080pFPS = 36 1080p30 streams
RTX4000 core clock 1005-1545(boost) -> 1545/1755*695 = 611 1080pFPS = 20 1080p30 streams

https://developer.nvidia.com/nvidia-video-codec-sdk and click “Additional Performance Results” “T*” vs “P*” encoder(s) and read very careful x-axis comments (“marketing” selection):

kd6icz · January 4, 2020, 10:20am

So this is the part I don’t understand. As far as I know I don’t have control over these metrics. Maybe I do with the very simple “quality setting” of Plex Media Server? I always choose the highest quality setting which says “Make My CPU Hurt”.

But I’ve been having issues in general with hardware transcoding that I don’t really know what to do at this point. Hardware transcoding stalls or doesn’t work at all and I’ve had to fall back on software transcoding.

Topic		Replies	Views
Turing H.264 Video Encoding Speed and Quality Technical Blog	9	2550	September 28, 2019
Video Encode and Decode GPU Support Matrix Video Processing & Optical Flow	77	248466	July 4, 2024
NVIDIA FFmpeg Transcoding Guide Technical Blog	24	4918	June 21, 2022
Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?) GPU-Accelerated Libraries	25	33099	February 26, 2018
NVIDIA Turing Architecture In-Depth Technical Blog	12	798	September 25, 2018
Enable PureVideo under Linux (MPEG-4 / H.264 XvMC) CUDA Programming and Performance	38	130510	April 19, 2008
How to achieve the H.264 encoding performance: 4K (3,840x2,160)/30fps with OpenMAX IL API/L4T R24.1 Jetson TX1	40	11890	October 18, 2021
Introducing NVIDIA Video Codec SDK 10 Presets Technical Blog	1	719	March 4, 2021
NVENC Quality - Blocky & Jumpy Blacks Video Processing & Optical Flow	16	7515	December 4, 2020
Video Codec SDK 9 Video Processing & Optical Flow	17	4736	April 23, 2019

Details about NVENC in Turing?

What’s new in Video Codec SDK 9.0:

Related topics