Increase throughput in pipeline with one primary and two secondary models

jwil · September 29, 2021, 7:44pm

• Hardware Platform (Jetson / GPU)
Nvidia Tesla T4 GPU
• DeepStream Version
5.1
• TensorRT Version
7.2.2
• NVIDIA GPU Driver Version (valid for GPU only)
460
• Issue Type( questions, new requirements, bugs)
Question

Hello,

I’m using DeepStream 5.1 for batch processing of non-live HLS video. The aim is to have the highest throughput possible.

The Gstreamer pipeline contains one nvstreammux element for batching, and three nvinfer elements. One primary detector and two secondary models which operate on the predicted bounding boxes of the primary model.

I’ve tried many different combinations of batch sizes for all models, however, the increase in speed was not that big. I settled on 8 for the detector and 256 for the secondary models. With this I can achieve 184 FPS on my test videos. However, just running the detector on its own achieves 444 FPS.

My current hypothesis of where the main bottlenecks are is in the transfer of batches between the nvinfer elements. The secondary models process each batch going into the primary model. There is no reforming of ideal sized batches. As the number of detected objects differs from frame to frame we are very likely to barely ever hit the optimal batch size for the secondary models.

Is it possible to make use of the nvstreamdemux element and maybe a queue in between nvinfer elements to decouple the nvinfer elements and feed the ideal batch size to the secondary models?

Is there anything else I need to consider for optimising batch processing with multiple models in DeepStream?

kayccc · October 5, 2021, 1:32am

Sorry for the late response, we will do the investigation and update soon.

Thanks

Fiona.Chen · October 13, 2021, 1:18am

No.

Deepstream is designed for common use for all kinds of videos inferencing, and no one can predict how many objects will be detected by primary detector. The batch size of SGIE is a tradeoff. And the SGIEs also using GPU resources, it can not reach the performance of single detector in the pipeline.

system · November 2, 2021, 2:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvinfer multi-threading DeepStream SDK deepstream	14	1715	April 5, 2022
Request feature to optimize Deepstream in back to back detector DeepStream SDK	7	505	October 12, 2021
Deepstream: Slow framerate (7 FPS) for TensorRT segmentation engine (20ms GPU latency) DeepStream SDK	10	1232	September 28, 2021
Multi primary detecter inference for different cameras, is it possible? DeepStream SDK camera , gstreamer	4	802	September 11, 2021
Large Number of RTSP Streams: Nvstreammux or separate pipeline per camera? DeepStream SDK camera , gstreamer	6	1126	October 8, 2022
How to accelerate single stream pipeline with batch size grater then 1 DeepStream SDK	15	1028	December 26, 2022
Create batch of frames for a single file stream DeepStream SDK tensorrt , gstreamer	6	1147	October 12, 2021
DeepStream Parallel Pipeline and Frame Synchronization DeepStream SDK nvbugs , deepstream	3	59	March 5, 2025
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	77	December 12, 2024
Nvinfer batch-size from video file input DeepStream SDK jetson-inference	14	773	March 19, 2024

Increase throughput in pipeline with one primary and two secondary models

Related topics