Hello, I wanted to follow up regarding my previous post here.
I am running DeepStream on many cameras, and I would like to estimate how many cameras I can safely process.
I am using an NVIDIA RTX A2000 12GB. According to your previous post, I can expect to be able to decode half of the H265 streams that can be decoded by an A10. According to data from Nvidia can be found [here] (Video Codec SDK | NVIDIA Developer), I would be able to decode 162 / 2 = 81 1080 30p H265 streams.
I would like to understand if running a DeepStream pipeline (AI models) on the GPU affect the number of cameras that can be supported. Should I expect the GPU to be able to decode a lower number of cameras?
Are there any components of the GPU that are shared between the NVDEC chip and other parts of the GPU used for inference leading to potential conflicts on resources and therefore reducing the resources available to decode video streams?
The NVDEC units are completely separate and shouldn’t directly affect inference. However, other factors such as memory bandwidth, compute overhead, or context switching can impact performance, especially if you’re decoding the maximum number of streams.
The camera number depends on your DeepStream pipeline. Hardware decoder is not the only factor to affect the pipeline performance. E.G. When you set up an inference pipeline, the video should be processed(scaling, format conversion, dewarping …etc) to the format which the model can accept, so the GPU will be used to process the video and do the inferencing. The model’s output should be processed to the data the user want(E.G. drawing the bboxes on the video). The ethernet bandwidth will also affect the performance when you use network streams such as RTSP, HTTP,… as the input sources.
So the performance of the DeepStream pipeline is decided by all the components used in the pipeline. The hardware video decoder is just a part of them.
Currently I have a pipeline that can decode 64 720p H264 streams using the CPU. I want to run the same pipeline on H265 streams and decoding using NVDEC.
The pipeline runs fine with 64 720p H265 streams decoded with NVDEC. The output for nvidia-smi in this second case is:
We can see the sm column being at 100% when using 1080p streams. However, at the same time, the pipeline is running slower - it’s processing less frames per second. Does video decoding on the GPU use streaming multiprocessors? I wasn’t expecting this to happen since my understanding is that NVDEC is a dedicated component.
So decoding the video using the GPU should not affect the sm usage? I am surprised because the only thing that changed between the two tests was the resolution of the video streams.