Number of simultaneous video decoders

BGreenway · February 11, 2017, 8:56am

It seems there’s a limit of 2 simultaneous encoding sessions on a card, but I can’t seem to find any mention of how many simultaneous decoders is possible on a single GPU. I have an application where I am receiving encoded frames, over a network connection, from possibly 100’s of cameras (this is a surveillance app). With that in mind, how could I leverage NVidia’s GPU decoding capability to address this problem.
Thanks in advance for any suggestions.

Vignesh_Ungrapalli · February 13, 2017, 8:22am

Hi

There is no limit on the number of decoders you can run in parallel. It is limited by availability of system resources.

Thanks

BGreenway · March 3, 2017, 9:08pm

Thank you, Vignesh.

I’ve been using a single decoder for more than a year now with good success. I’m trying to scale up to multiple decoders.

I use a single cuda context with all of the decoders instantiated inside this context. I do this assuming that only a single context is active at at time, so in order to have any hope of simultaneous execution of kernels or video decoders, they all must be on the same context. Please correct me if I’m wrong.
I have separate CPU threads running feeding and operating each decoder.
I have a separate cuda stream created for each decoder, although these streams are only used with my cuda-based image processing on the outputted (decoded) images (i.e. NV12 to ARGB, etc.).
I’m also displaying the images on a WPF D3DImage…but that’s a whole other set of issues. Working, but performance is worrisome.

Anyway, all that said, I have a couple of questions:

1 - There doesn’t appear to be any use of Streams with the CUvideodecoder, so I’m assuming that actually trying to get multiple decoders on the GPU to simultaneously be decoding is not possible? My hope is that it is possible to simultaneously decode, but since the CUvideodecoder also takes context lock argument in its creation, I wonder is each video decoder locks the entire context (preventing any other decoders from doing anything) while it performs some operations.

2 - is there any way to predict the amount of gpu memory/resources required to run a CUvideodecoder? My intuition tells me it must be related to the size of the decoded image, but its hard to tell.

Sorry for the long question.

Bryan

Vignesh_Ungrapalli · April 4, 2017, 9:17am

I use a single cuda context with all of the decoders instantiated inside this context
From the above statement, I am assuming single CUDA context for all decoders inside this “process”. If my understanding is correct, your assumption that you should only have a single context for parallel decoder executions is incorrect. You can have multiple contexts (context per thread). “Context per thread” model can be used to saturate the decode engine. Note again, that video decode engine is completely independent and separate from graphics engine on the GPU and hence the optimization principles for CUDA do not necessarily apply directly for video decoding.

There doesn’t appear to be any use of Streams with the CUvideodecoder, so I’m assuming that actually trying to get multiple decoders on the GPU to simultaneously be decoding is not possible? My hope is that it is possible to simultaneously decode, but since the CUvideodecoder also takes context lock argument in its creation, I wonder is each video decoder locks the entire context (preventing any other decoders from doing anything) while it performs some operations.
As I said above Video Decode is different from kernel executions. The concept of “multiple streams to saturate graphics engine” does not apply to Decode engine per se. But, we can have multiple threads feeding the decoder. For example, if the decoder can decode a single 4k video@60fps then we expect it to decode 4 full-HD (1080p) streams@60fps.

is there any way to predict the amount of gpu memory/resources required to run a CUvideodecoder? My intuition tells me it must be related to the size of the decoded image, but its hard to tell.
GPU memory utilization depends upon resolution of the video to be decoded, besides several other factors. You can use NVML APIs to get the current GPU memory utilization

Topic		Replies	Views
How to decode multiple videos concurrently with NVENC? General Topics and Other SDKs	1	681	August 14, 2019
How to decode multiple videos concurrently with NVENC? GPU-Accelerated Libraries	0	663	February 27, 2019
CUDA Decoder API multi-stream limitation? CUDA Programming and Performance	3	3436	May 20, 2010
Decoding multiple h264 video streams using single decoder Video Processing & Optical Flow	4	748	July 1, 2019
GT 720 video decoder count limitation Video Processing & Optical Flow	0	598	April 27, 2021
Questions of cuda video decoder CUDA Programming and Performance	0	1250	March 23, 2009
How many H.264 videos can NVDEC decode at once? General Topics and Other SDKs	1	2162	May 9, 2018
GT 720 video decoder count limitation General	7	1107	October 12, 2021
Sample AppDecMultiFiles in VideoCodec SDK does not improve the performance Video Processing & Optical Flow	1	705	December 6, 2019
NVML multiple instance of NVDEC General Topics and Other SDKs	1	655	April 8, 2017

Number of simultaneous video decoders

Related topics