CUDA 5.0 (Decode video using NVCUVID) and Performance

Vitaliy · November 8, 2012, 1:04pm

Hey guys,

I decided to test the decoding of 720p video (MPEG2) on a variety of graphics devices. In my possession are four devices:

Nvidia GeForce 8600 GT (32 cores)
Nvidia Tesla C1060 (240 cores)
Nvidia GeForce GTX 680 (1536 cores)
Nvidia GeForce 560M (192 cores)
I execute the application of cudaDecodeGL.exe(from SDK) with the parameter of -nointerop and the input file plush1_720p_10s.m2v (is an example).
Result [Average Rate of Decoding (fps)]:
Nvidia GeForce 8600 GT (32 cores): ~105 fps
Nvidia Tesla C1060 (240 cores): ~550 fps
Nvidia GeForce GTX 680 (1536 cores): ~800 fps
Nvidia GeForce 560M (192 cores): ~460 fps

The device of Nvidia GeForce GTX 680 is based on the technology of Kepler (GK104), is on board in 1536 cores, it is 48 times more than the 8600 GT or 6 times more than the Tesla 1060. So why are such uneven results? Should we expect a good performance gain in video decoding?

If you take the other formats such as H.264, the situation is the same, very low productivity Kepler. Why?

Thanks!

madshi · November 8, 2012, 1:48pm

Decoding is done by hard wired circuit, not by the 3D rendering pipeline. So only the GPU generation counts, not the number of cores.

Vitaliy · November 8, 2012, 10:39pm

I will bring a sample.
Studying the API can specify the type of decoding:

typedef enum cudaVideoCreateFlags_enum {
     cudaVideoCreate_Default = 0x00, // Default operation mode: use dedicated video engines
     cudaVideoCreate_PreferCUDA = 0x01, // Use a CUDA-based decoder if faster than dedicated engines (requires a valid vidLock object for multi-threading)
     cudaVideoCreate_PreferDXVA = 0x02, // Go through DXVA internally if possible (requires D3D9 interop)
     cudaVideoCreate_PreferCUVID = 0x04 // Use dedicated video engines directly
} CudaVideoCreateFlags;

In the example you can specify this in mind the parameters:

printf (" t-decodecuda - Use CUDA for MPEG-2 (Available with 64 + CUDA cores)  n");
printf (" t-decodedxva - Use VP for MPEG-2, VC-1, H.264 decode.  n");
printf (" t-decodecuvid - Use VP for MPEG-2, VC-1, H.264 decode (optimized)  n");

Will review the results of two modes of operation (software (-decodecuda) and hardware (-decodecuvid) decoding):
[Call cudaDecodeGL: cudaDecodeGL.exe -nointertop -decodecuda/-decodecuvid plush1_720p_10s.m2v]
Device: GeForce 8600 GT -decodecuda: 105 fps -decodecuvid: 140 fps
Device: Tesla C1060 -decodecuda: 565 fps -decodecuvid: 140 fps
Device: GeForce GTX 460 -decodecuda: 715 fps -decodecuvid: 210 fps
Device: GeForce GTX 680 -decodecuda: 810 fps -decodecuvid: 370 fps

As can be seen for the GeForce 8600 GT Software decoding loses hardwar decoding, as the number of cores of the device is less than 64. All logical.
The more a nuclear device by decoding using CUDA wins (hardwar decoding loses). And then all logical.

But I am not to understand why Kepler (GeForce GTX 680) gives a small result for CUDA decoding …

Topic		Replies	Views
CUDA H264 decoding CUDA Programming and Performance	1	3013	June 29, 2011
[Linux] NVCuvid - Performarce CUDA Programming and Performance	13	4055	March 9, 2016
Hardware compatiblity across NVIDIA chips CUDA Setup and Installation	2	895	October 10, 2013
Nvidia docker decoder and cuda function performance issue on multiple cards General Discussion	0	951	November 2, 2022
260M GPU memory usage for one GPU h264 video decoder is normal? Video Processing & Optical Flow	10	4051	February 7, 2020
CUDA Video Decoder problem CUDA Programming and Performance	10	13207	September 9, 2010
GeForce 940MX decoding General Topics and Other SDKs	2	5695	April 26, 2017
Is there performance problem in CUDA and Windows? CUDA Programming and Performance	2	518	March 22, 2017
NVDEC/CUDA/NVENC speed comparison GPU-Accelerated Libraries	9	51192	May 30, 2019
Video Encoder CUDA Programming and Performance	5	30103	March 8, 2011

CUDA 5.0 (Decode video using NVCUVID) and Performance

Related topics