Request feature to optimize Deepstream in back to back detector

Hello developers nvidia.

I faced with limited of performance of Deepstream when using back to back detector. The problem is when using more than one engine detector in Deepstream pipeline. Every engines except primary detector engine will need to have more than sequence processing.
Example the primary engine process 32 source and each frame have at least 10 objects then the secondary engine after need processing crop on objects with 32*10 (320) sub-frame. it will be terrible if I want to add the third engines after the second one.

The pre-processing include crop, padding, normalization in gstnvinfer … so I thought this is the main reason make limited of deepstream even though how fast backbone technique engine we are using like resnet, ssd or yolo.

Finally I hope we could have the solutions for this problem like made praralled preprocessing in a batch or something else.

If have any idea to improve performance please discuss or share with us here.
Thank you.

DeepStream has considered the case you mentioned, for example, in gst-nvinfer, the pre-processing runs in parallel with inference.
Pipeline performance bottleneck needs to be analysized case by case, the bottleneck may be in any one of the GIEs or even the pre-processing.

1 Like

hello @mchi, You mean want to mention about gstdsexample_optimized.cpp (Parallel batch pre-processing and processing) apply to gstnvinfer. Right?

Sorry! I don’t capture your point.

Hello @mchi, sorry never mind. I have a question, what did nvinfer do when have two engines, the first push too much objects to downstream for the second engines, and could be overflow buffer queue happen. Did the some buffer will be drop? and where have it is done?

hi @mchi ? the preprocesing and infer in gstnvinfer is not parallel, right? If then, as I know gstreamer do not allow manager other thread outside of main gst thread, why dont we separate the preprocessing and infer into two gst-puglin instead of add to one plugin? so the pre-processing runs in parallel with inference like you said.

The 2nd gie can set to higher batch,for exmaple, the batch of 1st gie is 2, the batch of 2nd gie set to 16. you can refer to deepstream-test2.

the preprocesing and infer in gstnvinfer is not parallel, right?

No, they are in two threads and different CUDA streams, so that can run in parallel.