Details about NVENC in Turing?

The question is “Q1. Did Turing cease to support field encoding function in H.264 ?” and the answer is NO. Although English is not my native language, to me this answer means “Turing did not cease to support field encoding in H.264”. That’s why I asked for a proof.

Hi, NVIDIA and everyone.
Sorry, I’m not good at English.

I’ll give a supplementary explanation about my question.

“Field encoding: no” means NV_ENC_CAPS_SUPPORT_FIELD_ENCODING returns 0 .
I think this might be a driver bug.
Because I think that NVIDIA doesn’t removes H.264 field encoding function that is useful and already implemented.
Is this a bug ? Or this is not a bug ?

I want to hear the answer from NVIDIA to clarify whether Turing supports H.264 field encoding function or not.

Turing supports H,264 field encoding ?
Or Turing doesn’t support H.264 field encoding ?
And if Turing doesn’t support this function, please tell me the reason why NVIDIA removed this useful function.

I think only Turing supports “HEVC B-frames”.
But I want to hear the answer from NVIDIA to clarify whether older GPU(Pascal etc.) supports “HEVC B-frames” with new SDK and driver, or not.

There is updated Video Codec SDK 9.0 page

New comparsion between Pascal and Turing, this comparsion is only for one NVENC engine, so all Pascal GPUS with 2 NVENC engines (1070+) will be faster than any Turing GPU when transcoding more than one stream.

And from those specs Turing will not be able to encode 8k@30fps in realtime, it can transcode only 2x 4:2:0 4k@30fps streams realtime, 8k will require at least 4 streams. Also there are not any informations about 8k transcoding anymore.

Now it is official!

There are also more info about decoding, which is much faster than Pascal and on Quadro RTX cards 2 NVDEC engines.

Charts to do not much current performance, Turing with preset slow is much slower compared to Pascal. Turing with preset medium is indeed faster compared to Pascal and also gives better encoding quality. Of course it is not 200% faster that nvidia claimed before.

Increase decoding performance is more important imo. My Quadro P2000 NVDEC is always maxed out when transcoding 4K , while NVENC is loading around 30%.

We just put Quadro RTX to production, performance are still poor, but quality is better.

Actualy we have 24x Quadro RTX 5000 in 6 servers, next week We will add another 16x Quadro RTX 5000, so totaly 40 GPUs.

It is very sad, that Quadro RTX 4000, 5000, 6000 has all the same performance in NVENC.

-------Quality vs Performance table for 1080p50 at 3Mbit/s-------

[table]
[tr]
[td]codec[/td]
[td]parameters[/td]
[td]GTX fps[/td]
[td]RTX fps[/td]
[td]SW fps[/td]
[td]GTX PSNR[/td]
[td]RTX PSNR[/td]
[td]SW PSNR[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-bf 3 -preset hq -profile high[/td]
[td]2x334[/td]
[td]375[/td]
[td]8.5[/td]
[td]34.817637[/td]
[td]35.032718[/td]
[td]34.794732[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset fast[/td]
[td]2x430[/td]
[td]678[/td]
[td]3[/td]
[td]34.706381[/td]
[td]34.797363[/td]
[td]35.776912[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset hq[/td]
[td]2x338[/td]
[td]155[/td]
[td]1.6[/td]
[td]34.719312[/td]
[td]35.318421[/td]
[td]35.799952[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-bf 4 -preset hq[/td]
[td]-[/td]
[td]161[/td]
[td]1.6[/td]
[td]-[/td]
[td]35.768031[/td]
[td]35.799952[/td]
[/tr]
[/table]

:::::::::SUMMARY:::::::

  1. There is better quality for H264, PSNR +0.2dB (8%), but encoding performance are only 56% of Pascal generation!

  2. When compare Pascal preset hq and Turing preset fast, quality is better by +0.1dB (4%) and encoding speed is same (1 NVENC Turing = 2 NVENC Pascal)

  3. When we compare high quality profile, encoding performance of Turing are only 24% of Pascal!!! => Not suitable for any 4K content, but quality is almost same as SW libx265, PSNR +1dB (40%)!!!

So I still don’t get why Nvidia didn’t put 2xNVENCs on Quadro RTX 5000,6000, it could be killer product.

Good info - what are your libx265 settings, out of curiosity? I guess for your “preset fast” row you’re using the same but what about “preset hq”?

  • libx264 -preset medium
  • libx265 -preset fast
  • libx265 -preset medium

Performance for SW encoding were mesured on 1CPU core, so for example on some dual socket Epyc with 64 CPU cores (128 threads) performance could be same as for 1 Nvidia RTX GPU :))))

It has to be said the quality at this kind of bitrate is very impressive under Turing.

I think i read somewhere that it degenerates at lower bitratea though, and libx265 performs better.

It is true, but usualy we don’t wan’t to go to lowest possible bitrate. When i compared 1080p50 at 1Mbit on Turing and SW -> libx265 was much better.

imo PSNR should not be used to compare quality between SW and HW encode, because x264/x265 lean toward to perceptual optimized. Netflix’s VMAF is a better metric .

I did also visual quality tests and from few samples:

1080p50 H265 3Mbit/s on Turing = 4Mbit/s on Pascal

It will be interesting to see if SDK 9 unlocks any further improvements to quality.

I think not significantly.

Main quality improvment is posible only in B-frames:

  • B-frames as reference could add around 0.2dB PSNR for HEVC
  • adaptive B-frames insertion could add around 0.5dB PSNR
  • better quantitization/bitrate allocation for B-frames vs P-frames could add around 0.4dB PSNR

On the other hand quality without B-frames is very near to SW.

Maybe maximum CU size? Our would that have been exposed in the existing SDK if they’d increased that from 32 to 64?

I did compiled ffmpeg with libvmaf support and has done some VMAF comparisons
[table]
[tr]
[td]codec[/td]
[td]parameters[/td]
[td]GTX VMAF[/td]
[td]RTX VMAF[/td]
[td]SW VMAF[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-preset hq -bf 3 -profile high[/td]
[td]85.60[/td]
[td]86.87[/td]
[td]85.70[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-preset slow -bf 3 -profile high[/td]
[td]85.64[/td]
[td]87.22[/td]
[td]85.23[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset fast[/td]
[td]84.97[/td]
[td]85.39[/td]
[td]87.06[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset hq[/td]
[td]85.06[/td]
[td]87.08[/td]
[td]87.14[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset hq -bf 4[/td]
[td]-[/td]
[td]88.64[/td]
[td]89.12[/td]
[/tr]
[/table]

For SW transcoding is used this conversion of preset in libx264 and libx265
fast = fast
hq = medium
slow = slower

H264 profile MAIN - 2.70% -> 3.00 Mbit/s rate on RTX = 3.08 Mbit/s on GTX
H264 profile HIGH - 16.48% -> 3.00 Mbit/s rate on RTX = 3.49 Mbit/s on GTX
H265 - 25.78% -> 3.00 Mbit/s rate on RTX = 3.77 Mbit/s on GTX
H265 with B-frames - 45.64% -> 3.00 Mbit/s rate on RTX = 4.37 Mbit/s on GTX

There is very minor difference between libx265 and NVENC for H265 at around 1.2% without B-frames and 10% with B-frames, so only place for improvment for now is B-adaptive, B-pyramid and B-refs which current SDK doesn’t supports for H265.

NVENC with HIGH profile is even better than libx264 medium at around 23%, so use NVENC!! :)))

I also did tests with other parameters and improvment for H265 is always around 42-47% for any bitrates (1.5 Mbit/s, 3 Mbit/s, 5 Mbit/s).

Great and interesting info - thank you for posting! Visually I can certainly see good improvements with RTX.

Very nice indeed !

If only NVIDIA release the NVENC/NVDEC chip as a separated hardware for professional video services. We don’t need those damn ray tracing and tensor cores .

New Video Codec SDK 9.0 drivers improved quality significantly when B-frames enabled, updated results:

—VMAF—
[table]
[tr]
[td]codec[/td]
[td]parameters[/td]
[td]4K fps[/td]
[td]GTX 415.27[/td]
[td]RTX 415.27[/td]
[td]RTX 418.30[/td]
[td]SW[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-preset fast -bf 3 -profile high[/td]
[td]152[/td]
[td]79.46[/td]
[td]83.25[/td]
[td]86.68[/td]
[td]-[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-preset hq -bf 3 -profile high[/td]
[td]94[/td]
[td]85.60[/td]
[td]86.87[/td]
[td]87.09[/td]
[td]85.70[/td]
[/tr]
[tr]
[td]h264[/td]
[td]-preset slow -bf 3 -profile high[/td]
[td]55[/td]
[td]85.64[/td]
[td]87.22[/td]
[td]87.41[/td]
[td]85.23[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset fast[/td]
[td]170[/td]
[td]84.97[/td]
[td]85.39[/td]
[td]85.39[/td]
[td]87.06[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset hq[/td]
[td]39[/td]
[td]85.06[/td]
[td]87.08[/td]
[td]87.08[/td]
[td]87.14[/td]
[/tr]
[tr]
[td]h265[/td]
[td]-preset hq -bf 4[/td]
[td]41[/td]
[td]-[/td]
[td]88.64[/td]
[td]89.62[/td]
[td]89.12[/td]
[/tr]
[/table]

H265 with B-frames - 58.19% -> 3.00 Mbit/s rate on RTX = 4.75 Mbit/s on GTX

Good JOB NVIDIA!

UPDATE: I just updated results for H264 which is also better with new drivers.

I’m not sure I understand what that Linux beta does - if he driver is adding support for SDK 9 wouldn’t ffmpeg or whatever need rebuilding with SDK 9 to take advantage? Or do you have early access, Thunderm?

Your results look really great though, thanks again for sharing them.