How many streams can be decoded when the GPU is running AI models?

mfoglio · September 26, 2024, 10:11am

Hello, I wanted to follow up regarding my previous post here.

I am running DeepStream on many cameras, and I would like to estimate how many cameras I can safely process.

I am using an NVIDIA RTX A2000 12GB. According to your previous post, I can expect to be able to decode half of the H265 streams that can be decoded by an A10. According to data from Nvidia can be found [here] (Video Codec SDK | NVIDIA Developer), I would be able to decode 162 / 2 = 81 1080 30p H265 streams.

I would like to understand if running a DeepStream pipeline (AI models) on the GPU affect the number of cameras that can be supported. Should I expect the GPU to be able to decode a lower number of cameras?

Are there any components of the GPU that are shared between the NVDEC chip and other parts of the GPU used for inference leading to potential conflicts on resources and therefore reducing the resources available to decode video streams?

Thank you

miguel.taylor · September 26, 2024, 4:38pm

Hi,

The NVDEC units are completely separate and shouldn’t directly affect inference. However, other factors such as memory bandwidth, compute overhead, or context switching can impact performance, especially if you’re decoding the maximum number of streams.

Fiona.Chen · September 27, 2024, 1:21am

The camera number depends on your DeepStream pipeline. Hardware decoder is not the only factor to affect the pipeline performance. E.G. When you set up an inference pipeline, the video should be processed(scaling, format conversion, dewarping …etc) to the format which the model can accept, so the GPU will be used to process the video and do the inferencing. The model’s output should be processed to the data the user want(E.G. drawing the bboxes on the video). The ethernet bandwidth will also affect the performance when you use network streams such as RTSP, HTTP,… as the input sources.

So the performance of the DeepStream pipeline is decided by all the components used in the pipeline. The hardware video decoder is just a part of them.

mfoglio · September 30, 2024, 7:55pm

Currently I have a pipeline that can decode 64 720p H264 streams using the CPU. I want to run the same pipeline on H265 streams and decoding using NVDEC.

The pipeline runs fine with 64 720p H265 streams decoded with NVDEC. The output for nvidia-smi in this second case is:

# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk 
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz 
    0    140     80      -     59     48      0     33      0      0   6801   1875 
    0     71     77      -     56     45      0     31      0      0   6801   1890 
    0    133     80      -      2      3      0     31      0      0   6801   1860 
    0    150     82      -     39     34      0     31      0      0   6801   1800 
    0    136     81      -     59     50      0     32      0      0   6801   1830 
    0    124     81      -     56     47      0     31      0      0   6801   1830 
    0    148     79      -     58     49      0     32      0      0   6801   1890 
    0     70     77      -     52     45      0     31      0      0   6801   1890 
    0    129     80      -      2      3      0     31      0      0   6801   1890 
    0    129     81      -     43     37      0     32      0      0   6801   1845 
    0    155     79      -     58     49      0     34      0      0   6801   1890 
    0    139     80      -     51     43      0     29      0      0   6801   1890 
    0    122     82      -     58     49      0     34      0      0   6801   1815 
    0     86     78      -     55     46      0     31      0      0   6801   1890 
    0    103     81      -      6      6      0     30      0      0   6801   1785 
    0    101     81      -     55     47      0     35      0      0   6801   1815 
    0    150     82      -     58     49      0     31      0      0   6801   1815 
    0    150     79      -     55     47      0     31      0      0   6801   1875 
    0    114     81      -     53     45      0     33      0      0   6801   1875

The pipeline is slowing down when decoding 64 1080p H265 streams using NVDEC. The output for nvidia-smi in this case is:

# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk 
# Idx      W      C      C      %      %      %      %      %      %    MHz    MHz 
    0     65     53      -    100     12      0     36      0      0   6801   1890 
    0     65     53      -    100     10      0     48      0      0   6801   1890 
    0     69     53      -    100     13      0     56      0      0   6801   1890 
    0     85     53      -    100     11      0     42      0      0   6801   1890 
    0     80     53      -    100     12      0     44      0      0   6801   1890 
    0     71     53      -    100     11      0     42      0      0   6801   1890 
    0     63     53      -    100     12      0     44      0      0   6801   1890 
    0     63     53      -    100     11      0     50      0      0   6801   1890 
    0     85     53      -    100     13      0     53      0      0   6801   1890 
    0     70     53      -    100     12      0     35      0      0   6801   1890 
    0     64     53      -    100     12      0     46      0      0   6801   1890 
    0     72     53      -    100     12      0     50      0      0   6801   1890 
    0     63     53      -    100     14      0     41      0      0   6801   1890

We can see the sm column being at 100% when using 1080p streams. However, at the same time, the pipeline is running slower - it’s processing less frames per second. Does video decoding on the GPU use streaming multiprocessors? I wasn’t expecting this to happen since my understanding is that NVDEC is a dedicated component.

Fiona.Chen · October 2, 2024, 2:33pm

The “sm” is the GPU usage, the “dec” is the video decoder(NVDEC) usage, they are different hardware components.

The nvidia-smi log shows that your pipeline have some tasks which use the GPU fully. Please check your pipeline.

mfoglio · October 4, 2024, 2:36pm

So decoding the video using the GPU should not affect the sm usage? I am surprised because the only thing that changed between the two tests was the resolution of the video streams.

Fiona.Chen · October 8, 2024, 5:54am

No. It will not.

So please check your pipeline, the extra video scaling needs GPU calculating.

yingliu · November 1, 2024, 8:46am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · November 15, 2024, 8:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How many H265 video streams can be decoded by a RTX A2000 12GB? DeepStream SDK	3	343	September 10, 2024
The number of cameras that DeepStream can support DeepStream SDK tensorrt , camera , gstreamer	3	662	December 15, 2023
Low fps rate and low volatile gpu-util running mutiple streams DeepStream SDK	9	695	May 6, 2023
RTSP stream with image-meta-test DeepStream SDK	24	109	August 13, 2024
Limited number of sources due to decoding DeepStream SDK	5	487	November 6, 2023
What is the workflow for using Gstreamer with NVDEC for decoding? DeepStream SDK pcie , gstreamer , python , deepstream	15	72	December 26, 2024
A100 nvdec usage is insufficient General camera , ubuntu , gstreamer , deepstream	18	37	March 10, 2025
Deepstream 6.0 nvv4l2decoder suddenly uses 100% CPU and crashes the application DeepStream SDK	16	808	July 25, 2023
Accelerating video decoding in Python using NVDEC DeepStream SDK rtsp , camera , decoder , opencv , gstreamer , docker , python , video , deepstream	4	2335	August 22, 2022
How to compress the size of a streaming image DeepStream SDK	22	887	January 10, 2024

How many streams can be decoded when the GPU is running AI models?

Related topics