NVDEC/CUDA/NVENC speed comparison

Thunderm · January 11, 2017, 10:11am

I would like to know how fast are current NVIDIA graphic cards NOT for gaming, but for encoding performance.

OUR TESTS:

[table]
[tr]
[td]card[/td]
[td]NVDEC[/td]
[td]NVENC H264[/td]
[td]NVENC H265[/td]
[td]CUDA DEINTERLACE[/td]
[/tr]
[tr]
[td]QUADRO M4000:[/td]
[td]1250[/td]
[td]2300*[/td]
[td]1200*[/td]
[td]4000[/td]
[/tr]
[tr]
[td]GTX 960:[/td]
[td]1800[/td]
[td]1800[/td]
[td]900[/td]
[td]3000[/td]
[/tr]
[tr]
[td]GTX 1060:[/td]
[td]2600[/td]
[td]2600[/td]
[td]1800[/td]
[td]4000[/td]
[/tr]
[tr]
[td]GTX 1070:[/td]
[td]2600[/td]
[td]2600[/td]
[td]1800[/td]
[td]5000[/td]
[/tr]
[tr]
[td]GTX 1080:[/td]
[td]2600[/td]
[td]5200*[/td]
[td]2600*[/td]
[td]10000[/td]
[/tr]
[/table]

Encoding and decoding is normalized to 720x576 resolution and units are FPS!

If you want to know speed for:
HD (1280x720) - divide all by 2
FHD (1920x1080) - divide all by 4

For example encoding on GTX 1070 in FHD quality to H264 will run 650 FPS

those cards have 2 NVENC engines, so speed for only one thread will be half

Comment 1 - Pascal generation has also better H265 quality!

anon56509511 · January 13, 2017, 9:27pm

There are more numbers “officially” from NVidia for more chips (kepler,maxwell g1, maxwell g2, pascal) and for many encoding parameters (quality vs. speed) - NVIDIA VIDEO CODEC SDK | NVIDIA Developer

https://developer.nvidia.com/nvenc-application-note
https://developer.nvidia.com/nvdec-application-note
The encoding values should be doubled if 2 encoder exists (see NVENC/chip https://developer.nvidia.com/video-encode-decode-gpu-support-matrix and should be the same for GTX cards with equivalent chips) and all values should be lowered if the chip is intentionally underclocked (20-30% to lower TDP).

Thunderm · January 13, 2017, 10:18pm

I have seen all of those documents before we created this table, but i was unable to find which GTX (no Quadro) has 2xNVENC chipsets and also NVDEC/CUDA performance, so this could help somebody to know true power of those cards…

medanibarra · October 9, 2017, 3:56pm

the gtx 1080 has 2x the threads for a total of 5200fps, do you know if streaming on obs (encoding h264 using nvenc) will be double the performance than gtx 1070 with 1x thread for 2600fps

Thunderm · October 20, 2017, 5:37pm

Yes GTX 1080 has double performance over GTX 1070

medanibarra · October 20, 2017, 10:46pm

Do you know if Gtx 1050 or 1050ti have same performance as gtx 1060 for h265?

Also do you think GTX 1070ti will be close to gtx 1080 or better for encoding h265?

Just wondering what your thoughts are, thanks and sorry for necro again.

jonathan.f.gillespie · November 8, 2017, 7:32pm

How many NVENC engines does the new GTX 1070 Ti have?
Given that the GTX 1070Ti is a slightly cut down GTX 1080, I’m Keen to know whether it has 1 or 2 NVENC engines and whether both are enabled.

EwoutH · October 25, 2018, 2:44pm

It’s strange, in the Video Encode and Decode GPU Support Matrix GeForce GTX 1070 - 1080 are listed with 2 NVENC. But in reality the GTX 1070 only contains one?

rao.subba.venkata · May 27, 2019, 10:23am

Hello Thunderm,

How did you test the NVDEC ?

I tried the steps as given in the following link and tried to playback a 5MP video file. It is not even decoding at 20fps. Any suggestions ?

Thanks,
Subbarao

rao.subba.venkata · May 30, 2019, 7:20am

We could achieve the decoding FPS as given in the NVIDIA decoder application notes.

Following command gave us the clue:

ffmpeg -i input.mp4 -f null /dev/null

Reference: https://stackoverflow.com/questions/20323640/ffmpeg-deocde-without-producing-output-file/20325676

Further tried hw_decode.c sample given in ffmpeg/doc/examples folder.
This took about 3 times more time to decode same input.mp4 file compared to the time taken for ffmpeg command given above.

Next modified the hw_decode.c as follows:

ret = avcodec_receive_frame(avctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            av_frame_free(&frame);
            av_frame_free(&sw_frame);
            return 0;
        } else if (ret < 0) {
            fprintf(stderr, "Error while decoding\n");
            goto fail;
        }
#define QUICK_RELEASE
#ifdef QUICK_RELEASE
            av_frame_free(&frame);
            av_frame_free(&sw_frame);
            return 0;
#endif

        if (frame->format == hw_pix_fmt) {
            /* retrieve data from GPU to CPU */
            if ((ret = av_hwframe_transfer_data(sw_frame, frame, 0)) < 0) {
                fprintf(stderr, "Error transferring the data to system memory\n");
                goto fail;
            }
            tmp_frame = sw_frame;

Here the frame gets decoded and immediately released before transferring the decoded frame to host. After this the time taken by the program reduced by 3 times and matched with ffmpeg command.

So, the conclusion is that time to transfer data from GPU memory to motherboard memory is taking time. I feel that shared memory is the only way to overcome this. Any other suggestions ?

Thanks,
Subbarao

Topic		Replies	Views
Question Regarding 10 Series NVENC Encoding General Topics and Other SDKs	1	4495	August 25, 2018
Video Encode and Decode GPU Support Matrix Video Processing & Optical Flow	89	275974	April 17, 2025
Session count limitation for NVENC (No Maxwell GPUs with 2+ NEVENC sessions?) GPU-Accelerated Libraries	25	33180	February 26, 2018
Encoding multiple video limited to 2 encodes CUDA Programming and Performance	8	7740	December 19, 2016
Where can I get GTX 680 NVENC SDK ? CUDA Programming and Performance	7	8706	February 8, 2014
What's the maximal number of AVC/H.264 streams generated in parallel by NvEnc? Video Processing & Optical Flow	2	1835	January 29, 2019
Choosing the right card for multiple video decoding Video Processing & Optical Flow cuda	5	2387	November 14, 2024
NVENC Pascal vs x264 Medium Preset Video Processing & Optical Flow	0	1314	February 17, 2019
260M GPU memory usage for one GPU h264 video decoder is normal? Video Processing & Optical Flow	10	4044	February 7, 2020
2x GTX1050 and only two encoding streams? Video Processing & Optical Flow	1	2433	August 21, 2017

NVDEC/CUDA/NVENC speed comparison

Related topics