Details about NVENC in Turing?

Speed i posted is for SD channel at 720x576, when i want to transcode one 4k channel at 60 fps, it will require speed of 1200 FPS, and from my table those GPUs are not capable of transcoding it to H265 in realtime! It can transcode only one 4k channel at 30 fps.

Previous generation was able to transcode two 4k channels at framerate 60 fps.

According to the white paper the new chip is meant to encode 8K @ 30 fps, so it would seem you’d get more than 30 fps at 4K.

Have you actually tried encoding 4K @ 60 fps?

It’s not clear what you are testing. What is your source and what software are you using? Perhaps you could post your command line.

I am sure what i am doing, we are using nvidia cards more than 4 years for encoding, decoding and video processing.

I am measuring encoder speed with simple ffmpeg command, ffmpeg is compiled against latest Video SDK 8.2, it doesn’t matter what is on command line, for NVENC most parameters you will set will have very low or zero impact on performance.

Measuring is simple, how long does it take to encode 1080p video to given format.

Because NVIDIA encoder performance are only based on decoder speed (NVDEC is much faster) and encoder output resolution, then i can see aproximate frame rate for 4k content.

If i encode 1080p content at rate 140 FPS, i have to divide this rate by factor of 4 (4k has 4 times more pixels than 1080p), then i will easily get speed 140/4=35 FPS, that is maximum speed of current encoder implementation for 4k video in H265 format.

Yes it is that simple…

I don’t necessarily doubt your theory, and I’m sure you know what you are doing (sorry if I doubted that - I was just asking for more info so I could maybe try a similar test and back up your results) but I’m unsure why don’t you just encode your sample at 4K just to give a 100% proper result?

So I did a bit of testing and yes, it seems as you say.

Firstly I captured a 1080p50 source as uncompressed yuv422p huffyuv using FFmpeg.

I got about 130fps encoding this to yuv420p using NVENC HEVC at around 3600k bitrate.

I then used FFmpeg to scale the source to 3840x2160 keeping it as huffyuv.

I then get just over 50fps when encoded to yuv420p, but if I leave it as the default yuv444p then I get about 40fps.

THX for retesting :) I thought that i am only one who is disapointed by NVENC in Turing :)

  1. From your result there is no plain linear correlation between speed and resolution for 4k, i found same result just now, it is odd, it looks like nvidia driver is somehow increasing clock for NVENC when encoding to 4k or 8k resolution or NVENC can encode faster to high resolution.

  2. I just found when i use -preset:v fast it will triple encoder speed.

It’s strange Nvidia is so silent about this. No new Video Codec SDK, no updated Video Encode and Decode GPU Support Matrix and no signal of life in this topic. While faster and more efficient video encoding was a big feature at launch.

I wonder if we will see a Maxwell situation where later cards (GTX 960) have more capabilities than earlier cards.

Yeah it’s all gone a bit quiet. The NVidia guy in the other thread has said it’ll be months before a new SDK or they update anything, which is disappointing as they were the ones going on about this mythical 25% improvement in bitrate for the same Q.

Good news.
RTX 2070 supports HEVC B-frame.
This is the result of “NVEncC64.exe --check-features”.

Difference:

Codec: H.264/AVC
Field Encoding     yes (GTX 1080) -> no (RTX 2070)

Codec: H.265/HEVC
Max Bframes        0 (GTX 1080) -> 5 (RTX 2070)

Add:
SSIM-Bitrate(HEVC main10,1080p): (GTX1060-B0 vs RTX2070-B0 vs RTX2070-B3)

NVEncC64.exe -i inputfile --codec hevc --output-depth 10 --vbrhq 0 --vbr-quality N --bframes B -o outputfile
N: 24/28/32/36 for Film, 20/24/28/32 for Anime.
B: 0 or 3(Turing only)

Questions

  1. How about RTX 2080 and RTX 2080Ti ? Please try "NVEncC64.exe --check-features" and paste result here or Pastebin.com.
  2. About Maxwell / Pascal / Volta, is NVIDIA planning to support HEVC B-frame with new SDK and driver ?

NVEncC (x64) 4.22 (r930) by rigaya, Nov 3 2018 23:33:47 (VC 1900/Win/avx2)
[NVENC API v8.1, CUDA 8.0]
reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4]

Environment Info
OS : Windows 10 x64 (17134)
CPU: Intel Core i7-6700K @ 4.00GHz (4C/8T)
RAM: Used 5165 MB, Total 32706 MB
GPU: #0: GeForce RTX 2080 Ti (8704 cores, 1545 MHz)[416.34]

List of available features.
Codec: H.264/AVC
Max Bframes 4
B Ref Mode yes
RC Modes 63
Field Encoding no
MonoChrome no
FMO no
Quater-Pel MV yes
B Direct Mode yes
CABAC yes
Adaptive Transform yes
Max Temporal Layers 0
Hierarchial P Frames no
Hierarchial B Frames no
Max Level 51
Min Level 1
4:4:4 yes
Max Width 4096
Max Height 4096
Dynamic Resolution Change yes
Dynamic Bitrate Change yes
Forced constant QP yes
Dynamic RC Mode Change no
Subframe Readback yes
Constrained Encoding yes
Intra Refresh yes
Custom VBV Bufsize yes
Dynamic Slice Mode yes
Ref Pic Invalidiation yes
PreProcess no
Async Encoding yes
Max MBs 65536
Lossless yes
SAO no
Me Only Mode yes
Lookahead yes
AQ (temporal) yes
Weighted Prediction yes
Max LTR Frames 8
10bit depth no

Codec: H.265/HEVC
Max Bframes 5
RC Modes 63
Field Encoding no
MonoChrome no
Quater-Pel MV yes
B Direct Mode no
Max Temporal Layers 0
Hierarchial P Frames no
Hierarchial B Frames no
Max Level 62
Min Level 1
4:4:4 yes
Max Width 8192
Max Height 8192
Dynamic Resolution Change yes
Dynamic Bitrate Change yes
Forced constant QP yes
Dynamic RC Mode Change no
Subframe Readback yes
Constrained Encoding no
Intra Refresh yes
Custom VBV Bufsize yes
Dynamic Slice Mode yes
Ref Pic Invalidiation yes
PreProcess no
Async Encoding yes
Max MBs 262144
Lossless yes
SAO yes
Me Only Mode yes
Lookahead yes
AQ (temporal) no
Weighted Prediction yes
Max LTR Frames 7
10bit depth yes

For those like me using FFmpeg, I’ve confirmed B-Frames also work for HEVC with FFmpeg 4.0.

i.e. -bf 5 succeeds, and produces improved quality files (using SSIM metric) over -bf 0, whereas bf 6 correctly produces an error message.

So at least B-Frames don’t need a new SDK.

Which opearating system did you tested? Under Linux drivers 410.73 and ffmpeg 4.0 i am getting no support of B-frames on RTX 2080 Ti for codec hevc_nvenc it gives me:

[hevc_nvenc @ 0x55a84be486c0] Provided device doesn’t support required NVENC features
Error initializing output stream 0:0 – Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

Windows - full details of the build, and the Nvidia driver etc are at the top of my post showing the output from NVencC.

FFmpeg 4.0.

Hmm, i see, there is no support for B-frames in linux yet…

Sounds like it. Assume you have the latest Linux driver?

Also, just to let you know I was using FFmpeg 4.0.2, from the Zeranoe downloads page. I’m fairly sure NVENC b-frames have been in FFmpeg for a while (for H264) so I expect your version is ok, and it’s just the case that you need an updated Linux driver.

Ok, i did detailed comparsion of encoders, NVENC H265 is far behind SW transkoding, but it is much better.

Yep, it also depends on the x265 settings though too. What preset are you using for x265 in the above, and is that with b-frames on Turing?

For offline encoding, yes software is better by far, as you can use something like x265 medium or better if you can spare the time, but for realtime encoding, well on my i7-6700k I can just about manage an x265 720p50 ultrafast encode with a couple of enhancements (I lowered the ctu and min ctu) and that’s marginally better than NVENC HEVC at the same resolution.

But this CPU can’t do 1080p50 real time with x265. Possible Intel’s newest would be able to though, I’m not sure.

Yes, true, I am using hevc_nvenc -bf 4 and -preset hq

You can use -bf 5, seems like 5 is supported for NVENC HEVC.

How about your x265 preset?

Wow, this means no more interlaced encoding for H264? Why? Interlaced encoding is a very needed feature for many cases. If you want to transcode a 1080i50 source without loosing full motion quality, you either have to deinterlace to 50p or do interlaced encoding.