Limited number of sources due to decoding

unsaltedbutter · October 27, 2023, 8:36am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) 525.125.06
• Issue Type( questions, new requirements, bugs)

I have a question regarding source decoding in deepstream. I have 4K videos (3840x2160) and a 4GB GPU (GTX 1650). I can launch up to 8 sources with 4K resolution, but adding more causes deepstream to crash due to insufficient memory. In addition, event though I am able to launch 8 sources, this is with no PGIE or SGIE in the pipeline, only sources and streammux. The performance of the pipeline is just 18 FPS.

Why does decoding use all GPU memory, not leaving any flexibility for more sources and only very limited flexibility with other plugin usage?
How come in retail 100’s of streams (e.g. camera monitoring) are decoded at once with CPU usage only? Are there any recommendations to reduce decoding resource usage, like using ffmepg to cut ROI from stream and create a new stream and pass it to deepstream?

miguel.taylor · October 27, 2023, 4:12pm

Hi @unsaltedbutter

Why does decoding use all GPU memory, not leaving any flexibility for more sources and only very limited flexibility with other plugin usage?

GPU-accelerated decoders require the the buffer on GPU memory for the operation. This is simply how it works. If you find that memory is the limiting factor rather than processing power, you can switch to using CPU decoders like ‘avdec_h264.’ However, this will result in slower performance, and you mentioned that you are already running the pipeline at 18fps with HW decoders, which is already slow.

How come in retail 100’s of streams (e.g. camera monitoring) are decoded at once with CPU usage only? Are there any recommendations to reduce decoding resource usage, like using ffmepg to cut ROI from stream and create a new stream and pass it to deepstream?

We have an inference server based on the T4, which is a significantly larger board. We have limited it to 32 streams of 1080p@30fps to avoid running out of memory. I’m not aware of any hardware that can support hundreds of 4K streams, except by using multiple instances on a cloud computing service. Deep learning models are typically trained with small images, and the preprocessing usually involves significant downscaling of the inputs. I would recommend using a lower resolution as the input; modern RTSP cameras often provide streams in various resolutions. You can then upscale the detections and apply them to the original 4K stream if needed.

You can also define an ROI, but that doesn’t assist the encoder; it will still require a significant amount of time to copy the buffer to the GPU and decode it. This is because you need a decoded buffer to get the ROI from it. It might be more effective to check if you can define an ROI on the producer side of the RTSP stream.

Fiona.Chen · October 30, 2023, 5:38am

The decoder limitation is listed in Video Code SDK | NVIDIA Developer

The GTX 1650 is consumer card. It is less than half of T4’s decoding capability. For 3840x2160@30fps h264 videos, the limitation may be less than 4.2 streams.

Please choose proper GPU product according to your requirement.

unsaltedbutter · November 4, 2023, 4:06pm

Thank you for the insights, it was useful.
I am still trying to understand the difference in hardware utilization when using 1 vs multiple sources.

For example, this is the output of nvidia-smi dmon when using 1 source 4K (OD batch size 8, since I am using 8 ROIs per source) (note: deepstream was launched midway through the logs):

Using 6 sources 4K (OD batch size 48):

Why does the utilization of both memory and decoder drops when using more sources? Why can’t the decoder be utilized more? Wouldn’t it increase the throughput of the pipeline (talking about FPS)?

Fiona.Chen · November 6, 2023, 2:15am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

What is your pipeline? What are the parameters you set?

system · November 20, 2023, 7:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The number of cameras that DeepStream can support DeepStream SDK tensorrt , camera , gstreamer	3	649	December 15, 2023
Why the fps is not crossing 35 even though free GPU space available DeepStream SDK	16	746	October 12, 2021
Multiple sources multiple rtspsink DeepStream SDK	30	713	July 24, 2023
L4 GPU not getting more than 15FPS for 80 rtsp stream DeepStream SDK	24	132	September 3, 2024
Maximum fps on a dgpu? DeepStream SDK	5	483	July 7, 2022
How many streams can be decoded when the GPU is running AI models? DeepStream SDK deepstream	8	156	November 1, 2024
Pixel distortions when the pipeline's FPS falls below the frame rate of the RTSP sources DeepStream SDK	7	323	May 14, 2024
Decoding 4k streams causing issues on decoder? DeepStream SDK	18	1416	October 9, 2021
Low GPU utilization DeepStream SDK tensorrt , cuda , ubuntu , gstreamer , deepstream	18	1304	June 8, 2023
Running on multi-gpus DeepStream SDK	4	354	October 12, 2021

Limited number of sources due to decoding

Related topics