Turing H.264 Video Encoding Speed and Quality

jwitsoe · February 13, 2019, 5:33pm

Originally published at: Turing H.264 Video Encoding Speed and Quality | NVIDIA Technical Blog

All NVIDIA GPUs starting with Kepler support fully-accelerated hardware video encoding; GPUs starting with Fermi support fully-accelerated hardware video decoding. The recently released Turing hardware delivered Tensor Cores and better machine learning performance, but the new GPU also incorporated new multimedia features such as an improved NVENC unit to deliver better compression and image quality…

anon79125214 · February 14, 2019, 8:33am

Turing NVENC is very good, we also did tests and see higher quality than libx264 at slow and veryslow presets, bigger difference is for H264 at High profile, where NVENC is better than libx264 by 10-15%!!! Only think sad is that Pascal generation had 2 NVENC engines so performance were two times better :(

anon14829401 · February 17, 2019, 6:42am

While this is a good point, you have to also consider that a single Turing NVENC can outperform a single Pascal NVENC in certain applications. Looking at NVIDIA's initial v9 SDK Tests, the single Turing NVENC H.264 1080p encoding performance is between 22% (High Quality) and 30% (Low Latency) faster than a single Pascal NVENC would be. However, as you pointed out, in the 4K HEVC test, the single NVENC encoding performance is the same between both Pascal and Turing (and I would assume Volta as well).

NVIDIA's customers will have to weigh the features and functions they need before deciding which generation of card to purchase. Taking what you pointed out, those that need more NVENC's and do NOT need HEVC B Frame / Ray Tracing / Tensor support would be better off purchasing one or more refurbished Quadro P5000/P6000's or Tesla P4/P40's (with 2x NVENC's). And if they don't need 8K HEVC, even refurbished Quadro GP100's or Tesla P100's (with 3x NVENC's) might be a good choice if the price is justified. For others that want a mix of the newer technologies though, I'd likely recommend at least one Turing-based card, but the others in a system could be Pascal, again, depending on the need for NVENC's.

anon79125214 · February 17, 2019, 7:13am

The main problem with speed is when you use HQ preset for HEVC (which is needed for low resolution channels as it will enable 8x8 CU instead of 16x16 CU for Medium), it will give you only 1/4 performance of P5000 on any Turing Card (600 fps vs 150 fps at 1080p).

Second problem is that if you wan't only NVENC there is no need to buy anything better than Quadro RTX 2000 (which is not yet released) as all Quadro GPU has same NVENC speed. We liked model when we pay more for P5000 to have 2xNVENC instead of one in P4000.

Currently we use Supermicro servers with 4xGPU, but with this new generation we will need 4x times more GPU, yes they could be cheaper (RTX 4000 or RTX 2000 when released), but we will need to change all our servers to something like SuperServer 6049GP-TRT which can handle 20 GPU or have 4x more servers, this will introduce other problems, from our internal tests we find that it is not very stable to use more than 4 GPU in one server.

Quality increase was expected as it is now year 2019, but we didn't expect such drop in performance.

This will make GPU NVENC solution much more expensive and when AMD will release new Epyc 2 CPUs there will be no difference between GPU and CPU transcoding performance, speed of 1 Turing NVENC HQ preset =< 1 AMD Epyc2 32cores at libx265 Medium preset.

anon14829401 · February 17, 2019, 8:30am

These are all valid points. You would think NVIDIA would consider making cards similar to Teslas (but specialized just for video applications) that offer multiple NVENC's / NVDEC's without all the other features at a lower price point. NVIDIA really needs to consider your point about the cost of purchasing Epyc 2's versus Quadro RTX 2000's / 4000's. While it might make sense at low-scale (mobile / desktop), as you said the cost isn't justified for workstations / servers, especially beyond 4 GPUs in a single system.

In my case, I use a video switcher application that only supports Intel QuickSync and NVIDIA NVENC/NVDEC. I'm considering the purchase of one or more refurbished P5000's (for around US$ 1,250 each), and adding a Turing GPU after the Quadro RTX 2000 is released once I have a justified need for the features Turing offers.

anon88644025 · April 11, 2019, 5:45am

Great discussion team, which is the best transcoding card I could buy to install in my super micro server for transcoding? Right now am using M6000 and want to upgrade so I can transcode more AVC services in 1080p.

anon4758803 · April 17, 2019, 7:56pm

Dear Roman, thanks for the interesting results. A couple of questions:
1. Why did you run x264 without lookahead option for "High quality? Hard to compare quality of encoders when one of them is started with different options
2. For x264 you set -threads 4. But your CPU is "Dual Intel Xeon E5-2660v3 @ 2.6 GHz" where CPU has 10 physical cores. I'd say that "-threads 10" looks more appropriate here for performance compassion.

anon61371653 · April 18, 2019, 11:31am

Hello Vasily,
Thank you for the kind talk.

>Why did you run x264 without lookahead option for "High quality?
libx264 uses 40 frames lookahead by default in medium preset, so there's no need to specify that.

>I'd say that "-threads 10" looks more appropriate
We've observed some time ago that for bigger amount of threads, libx264 sometimes produce bitstream with bitrate being lower than it's set from CLI. It's not a big deal for the desktop CPUs, but for, say, 20 threads on a server-grade CPU it really becomes an issue. I've not checked if this is fixed in more recent libx264 releases, however.

anon4758803 · April 18, 2019, 4:13pm

OK, I see, thanks. I've raised threading question since you should see another FPS with 10-20 threads what impacts the diagram about number of simultaneous streams for x264.

anon95568229 · September 28, 2019, 1:26pm

Hello, possible to use h264_nvenc with -profile:v baseline -level 3.0 ?

Topic		Replies	Views
Details about NVENC in Turing? Video Processing & Optical Flow	108	52786	August 7, 2024
Video Encode and Decode GPU Support Matrix Video Processing & Optical Flow	89	285542	April 17, 2025
Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?) GPU-Accelerated Libraries	25	33204	February 26, 2018
Quadro P4000 encoder count Video Processing & Optical Flow	12	10378	July 9, 2018
ffmpeg failed at encoding on Tesla T4 card Video Processing & Optical Flow	2	2910	December 28, 2019
Tesla M10 Tesla Boards	13	34060	December 5, 2016
NVENC Quality - Blocky & Jumpy Blacks Video Processing & Optical Flow	16	7775	December 4, 2020
NVIDIA Turing Architecture In-Depth Technical Blog	12	814	September 25, 2018
NVIDIA FFmpeg Transcoding Guide Technical Blog	24	5248	June 21, 2022
How to achieve the H.264 encoding performance: 4K (3,840x2,160)/30fps with OpenMAX IL API/L4T R24.1 Jetson TX1	40	11912	October 18, 2021

Turing H.264 Video Encoding Speed and Quality

Related topics