Details about NVENC in Turing?

I noticed that weighted prediction isn’t working with FFmpeg for my RTX 2080TI. It accepts the FFmpeg argument for it (-weighted_pred 1) but it fails initialising the encoder. If you change it to -weighted_pred 2 i.e. an invalid value, then FFmpeg instead correctly reports that the value is out of range. So it shows FFmpeg is processing the argument and passing it to NVENC but something then fails further on.

Maybe FFmpeg needs the new SDK or something.

Thanks for your information.

malakudi:

Wow, this means no more interlaced encoding for H264?

Unfortunately, the answer is “Yes”…I don’t know why…

oviano:

I noticed that weighted prediction isn’t working

  1. According to NVENC_VideoEncoder_API_ProgGuide.pdf , "Weighted prediction is not supported if the encode session is configured with B frames." If you use B-frame, You should try "-bf 0 -weighted_pred 1".
  2. "HEVC + weithtp" is unstable. Do you use h264_nvenc ? Or hevc_nvenc or both ? NVEncC Issue #15 NVEncC Issue #18 NVEncC Issue #34

Yes, I’d forgotten you can’t use weighted prediction with B-frames. Thanks.

I am only using HEVC. Just trying to push the settings to the highest quality limit. The addition of B-frames has made a visually noticeable aswell as a measurable difference, which is great.

According to x265 presets https://x265.readthedocs.io/en/default/presets.html they use maximum 4 bframes from preset veryfast to slow.

From our previous tests NVENC supports up to 4 bframes for H264, but quality was better when we used 3 bframes. It is same for H265, from my internal tests quality is better when i use 4 bframes instead of 5.

What is not working on NVENC H265 is bframes as reference, it will require new API/SDK, because there is no such setting in struct _NV_ENC_CONFIG_HEVC and it could improve quality at about 5%.

Also i am not sure if current NVENC on RTX is using maxCUSize 64, this could improve quality by 5-10%.

Yeah I saw x265 were using 4 b frames - I didn’t know that 4 might improve quality over 5. That’s great feedback and I will see if I can get the same result.

Wouldn’t it be nice if Nvidia updated the SDK so we could actually properly use the new hardware…

So tested against my 1080p50 60s sample I also find that bf=4 is the optimum, based on a SSIM analysis:

bf=0 : 0.965553
bf=1 : 0.967545
bf=2 : 0.968899
bf=3 : 0.969047
bf=4 : 0.969632
bf=5 : 0.968897

So something else I have noticed with Pascal vs Turing, at least with the two samples I have tried is that on Pascal cbr_hq produces a marginally better result than vbr_hq but this trend is reversed with Turing.

Sample #1

Pascal cbr_hq 0.962293
Pascal vbr_hq 0.961952
Turing cbr_hq 0.969287
Turing vbr_hq 0.969632

Sample #2

Pascal cbr_hq 0.956651
Pascal vbr_hq 0.956239
Turing cbr_hq 0.962946
Turing vbr_hq 0.963394

rc-lookahead 32
bf 4 (for Turing)
VBR: vb 3600k bufsize 3600k maxrate 5400k
CBR: vb 3600k bufsize 3600k maxrate 3600k

This is of course, just two samples and I haven’t tried different bitrates etc, but it’d be interesting to know if the VBR mode has been improved with Turing.

There is minor improvment in Nvidia BETA drivers 415.13, they allocate bitrate more accurate when B-frames are used for HEVC, so it will add around 0.15-0.2 PSNR

They also produce very little different output for HEVC on Pascal, which added around 0.03 PSNR

This might be old news, the encode/decode matrix is updated with Turing

Nvidia also added this statement… →

1* The video encoder in Turing GPUs has substantially improved quality and performance compared with Pascal. The overall encoding capacity of one NVENC in Turing is comparable to two NVENC’s in Pascal.

** The Video Codec SDK, which exposes new encoder improvements and features of Turing will be released soon. Until then, users can continue to use Video SDK 8.2 on all GPUs.

I am very sad that Nvidia put only one NVENC in RTX 5000/6000, i think there will be two modes:

  • fast - almost same speed as GTX 1080, better quality + 0.7 PSNR
  • hq - 1/4 speed of GTX 1080, same quality as libx265 + 1.1 PSNR

Maybe they will improve speed in HQ mode, but I don’t think so.

Another good news, Nvidia BETA drivers 415.13 supports:

  • upto 8 bframes (instead of 5) for HEVC (quality is best with 4), same as libx265 slower+ profiles
  • b_ref_mode EACH for H264 (only NONE and MIDDLE was supported), from my tests it is worst than NONE, but it works, maybe it is good for some situations

From my tests HEVC encoding with -b:v 3M -bf 4 -preset hq on Turing is same quality as -b:v 4.5M -bf 0 -preset hq on Pascal, so actualy quality improvment is very high +1.1 PSNR!.

Can anyone test NVENC on Quadro RTX 5000? They are already on the market.

I’d like to test the beta driver too but I’m confused about the versions as I am using Windows and have a higher driver number than 415.13 and do not see any beta driver available.

Or are you using Linux?

On windows i think driver version is already stable, so try latest stable drivers from 8.11.2018

Ok, thanks!

So latest Windows driver produces the same SSIM values for me as I posted further up, so I guess I have the latest.

Anyway, I also added some x265 SSIM value for comparison.

Sample #1

rc-lookahead 32

Pascal cbr_hq 0.962293
Pascal vbr_hq 0.961952
Turing cbr_hq 0.969287
Turing vbr_hq 0.969632
x265 medium cbr 0.972751
x265 medium vbr 0.972719
x265 ultrafast cbr 0.962138
x265 ultrafast vbr 0.962089

Sample #2

rc-lookahead 32
bf 4

Pascal cbr_hq 0.956651
Pascal vbr_hq 0.956239
Turing cbr_hq 0.962946
Turing vbr_hq 0.963394
x265 medium cbr 0.966768
x265 medium vbr 0.966524
x265 ultrafast cbr 0.956334
x265 ultrafast vbr 0.955160

It’s interesting that in these samples and bitrates (vb 3600k, bufsize 3600k, maxrate 5400k for VBR) that Pascal is marginally better than x265 ultrafast, and Turing is now comfortably exceeding x265 ultrafast.

I think x265 would have the advantage at lower bitrates, as NVENC quality seems to fall away pretty badly as you go down.

Hi, NVIDIA.
I would like to know your answer.

Q1. Did Turing cease to support field encoding function in H.264 ? (I would like to ask the reason if possible.)

Q2. “HEVC B-frame” is Turing only function ? Or older GPU(Pascal etc.) can use “HEVC B-frame” with new SDK 9.0 and driver ?

1 NO
2 NO

Any proof on this? You have early access to Video Codec SDK 9? Because with current drivers and video codec sdk, Turing does not support field encoding on H.264

Don’t think anything has been made official but if the same SDK and drivers with a Turing card don’t support field encoding on H.264 but the same SDK and drivers with a Pascal card do support field encoding, does that not suggest the difference is in the hardware?

No proof yet, but I’d be 90% sure that Thunderm is correct.

EDIT: Ah, I’m taking Thunderm’s responses to mean there is NO field encoding on h.264 and b-frames only on Turing.