Parallel branching in deepstream 6.4

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) : NVIDIA GeForce RTX 3090
• DeepStream Version : 6.4
• JetPack Version (valid for Jetson only)
• TensorRT Version : 12.2
• NVIDIA GPU Driver Version (valid for GPU only) : 535.104.05
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am trying to build pipeline branching like in below image

after run this pipeline line with two branches , I putted probe function on branch1 but I am still see results from branch2

if I commented out branch2 , I can not find any detections

I tried to put nvstreamdemux and nvstreammux after tee plugin on branch1 and branch2 still commented , pipeline is not working/

so I see abnormal behavior and I don’t know where is the problem , so can you tell me how to run this pipeline with branches as in image successfully ?

Appreciate your help

“tee" only clone the buffers to branches but not copy the buffers, the buffers in branch 1 are exactly the buffers in branch 2.

What is the purpose of your branch1 and branch 2?

You may refer to GitHub - NVIDIA-AI-IOT/deepstream_parallel_inference_app: A project demonstrating how to use nvmetamux to run multiple models in parallel.

I checked the

that you mentioned before asking

and it didn’t work as mentioned above

what is difference between clone and copy?

the purpose of branch1 and branch2 to parallelize inference and minimize time since sgie2 depends only on pgie so this will minimize total inference time

What does you mean? The deepstream_parallel_inference_app just tell you how to handle the batched data with tee.

To make the explanation simple,
“clone” means the “buffer” is the same “buffer”, tee subprojects/gstreamer/plugins/elements/gsttee.c · main · GStreamer / gstreamer · GitLab just create new “pointer” to point to the same “buffer”. When you change the “buffer” content through one “pointer” in one branch, you can see the same change in the other branch through another “pointer”, since they pint to the same “buffer”.
“copy” will create a whole new “buffer”.

Is your purpose to minimize latency or to minimize processing time?

Thanks for details about clone and copy

Yes I want to use multiple branches to make models work in parallel which which will reduce processing time and will affect also whole pipeline latency

So can you help me in this case, please

For your PGIE + multiple SGIEs case, multiple branches are not necessary. The parallel pipeline may not be faster than the normal pipeline like /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2

You can try this parallel pipeline under the /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2 directory.

gst-launch-1.0 nvstreammux batch-size=2 width=1920 height=1080 name=mux ! nvinfer config-file-path=./dstest2_pgie_config.txt batch-size=2 ! nvtracker ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/ display-tracking-id=0 ! tee name=t t.src_0 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),width=1920,height=1088' ! nvinfer config-file-path=./dstest2_sgie2_config.txt ! nvmultistreamtiler width=1920 height=2160 rows=2 columns=1 ! queue ! nvdsosd display-text=1 display-bbox=1 ! nveglglessink uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! mux.sink_0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4 ! mux.sink_1 t.src_2 ! queue ! nvvideoconvert ! 'video/x-raw(memory:NVMM),width=1920,height=1088' ! nvinfer config-file-path=./dstest2_sgie1_config.txt ! nvmultistreamtiler rows=1 columns=2 width=3840 height=1080 ! nvdsosd ! nveglglessink

I used your pipeline and now it is working but FPS becomes lower as in below screenshot

you mentioned that multiple branches are not necessary for my case, so can you explain why multiple branches didn’t reduce latency or processing time so performance become more worse

You can compare the pipeline in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2 and the so-called parallel pipeline I give to you. Extra conversion and processing are needed for separating the “buffers” for branches. It will not make the pipeline faster.

Extra conversion and processing already implemented in deepstream or should I implement that?

Appreciate more clarification

DeepStream is a SDK. The pipeline I provided is an implementation.